Question: Regarding time complexity of Oversamplers and "Noise Filters"

analyticalmindsltd / smote_variants

A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features

MIT License

623 stars 138 forks source link

I agree, the time complexity of oversampling techniques is somewhat unexplored. Some runtime measurements are incorporated though. There was an extensive evaluation (shared in corresponding papers), and based on the average runtimes on 104 datasets a ranking of oversampling techniques is available. For example, if one is interested in the 10 quickest techniques overall, then can query them as

import smote_variants as sv

# get 10 quickest oversamplers
oversamplers = sv.get_all_oversamplers(n_quickest=10)

Although this is not a true time complexity analysis, it can still be used to query computationally efficient techniques for further research or application purposes.

Nevertheless, a proper time complexity analysis by varying number of majority and minority samples, features, imbalance ratios class overlap, etc. would be very useful.

Regarding the noise filters, they are not intended to be primary or necessary steps for oversampling pipelines, but similar analysis on them could still be useful.

analyticalmindsltd / smote_variants

Question: Regarding time complexity of Oversamplers and "Noise Filters" #60