Open leaphan opened 3 years ago
thanks for your reply, i wrote a code like this:
pip install -U imbalanced-learn pip install smote-variants import numpy as np import smote_variants as sv
from imblearn.datasets import fetch_datasets
datasets = fetch_datasets(filter_data=['oil']) X, y = datasets['oil']['data'], datasets['oil']['target'] [print('Class {} has {} instances'.format(label, count)) for label, count in zip(*np.unique(y, return_counts=True))]
oversampler= sv.SOMO() X_samp, y_samp= oversampler.sample(X, y)
[print('Class {} has {} instances after oversampling'.format(label, count)) for label, count in zip(*np.unique(y_samp, return_counts=True))] print(X_samp, y_samp)
and the print result : Class -1 has 896 instances Class 1 has 41 instances Class -1 has 896 instances after oversampling Class 1 has 41 instances after oversampling After oversampling, There is no change in the number of two types of samples.
There can be multiple reasons for that. In many cases the authors of a particular SMOTE variant did not cover all the possible corner cases, for example, 1) all minority samples are treated as noise according to the noise definition of the technique, 2) the method wants to work with, say, 5 nearest neighbors, but there are only 3 minority samples, 3) mathematical techniques like self-organizing maps, do not converge, 4) etc.,
all of these because of the nature of the data is not compatible with the parameter settings and presumptions of the SMOTE variant.
Where I found reasonable resolutions, I implemented them, in those cases when it is unfeasible (for example, determining the 5 closest neighbors when you have only 3 samples in a class), the data is returned unaltered, although I would expect some message in the logs if logging is enabled.
Most likely your data is a corner case of the SOMO implementation with the parameters you used. Adjusting the parameters might lead to a properly operating SOMO.
Also, if you share a minimal working example, I can look into it.