damianhorna / multi-imbalance

Python package for tackling multi-class imbalance problems. http://www.cs.put.poznan.pl/mlango/publications/multiimbalance/
MIT License
75 stars 10 forks source link

TypeError: 'SOUP' object is not subscriptable #77

Closed arilwan closed 2 years ago

arilwan commented 2 years ago

Working with SOUPBagging and encountered this error. Not sure why the error is raised but this is how I work with it:

from multi_imbalance.resampling.soup import SOUP
from multi_imbalance.ensemble.soup_bagging import SOUPBagging

#5-class problem, classes 1 & 4 are minority
soup = SOUP(maj_int_min={
        'maj': [0, 2, 3],
        'min': [1, 4]
    })

tree = DecisionTreeClassifier(random_state=123)

clf = SOUPBagging(tree,n_classifiers=100, maj_int_min=soup)

clf.fit(xtrain, ytrain)

File "~/anaconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
TypeError: 'SOUP' object is not subscriptable

Could you spot the problem?

plutasnyy commented 2 years ago

Hello! Thanks for pointing out the problem with SOUP Bagging. It seems to me that the way you passed the parameters to the function is not correct. According to the docs, there are 3 parameters:

classifier – Instance of classifier maj_int_min – dict {‘maj’: majority class labels, ‘min’: minority class labels} n_classifiers – number of classifiers

So one by one: classifier - This is an instance of your classifier. SOUP Bagging will create n_classifiers copies of this classifier, and before training each of them, it will apply resampling on the data using SOUP. In your case this should be: tree maj_int_min - It is the dictionary that you already created and passed to the SOUP, however you don't have to do that. SOUP Bagging will apply SOUP for each of your classifiers automatically. n_classifiers - how many copies of your classifier will be created

So, to sum up, instead of passing instance of SOUP you need to pass maj_int_min dict to SOUPBagging. Your script after refactor:

from multi_imbalance.ensemble.soup_bagging import SOUPBagging

#5-class problem, classes 1 & 4 are minority
maj_int_min={
    'maj': [0, 2, 3],
    'min': [1, 4]
}

tree = DecisionTreeClassifier(random_state=123)
clf = SOUPBagging(tree,n_classifiers=100, maj_int_min=maj_int_min)

clf.fit(xtrain, ytrain)

I have not tested this, so please let me know if it helped you.

For better understanding I will show you a code, and where it is. Creating new classifiers is in the constructor of SOUPBagging. Here, the list of classifiers is created, as you can see from:

for _ in range(n_classifiers):
[...]
    if classifier is not None:
        self.classifiers.append(deepcopy(classifier))
[...]

So, there will be created n_classifiers instances of your classifier.

Then in fit_classifier each of yours classifiers is resampled using SOUP:

x_resampled, y_resampled = SOUP(maj_int_min=maj_int_min).fit_resample(x_sampled, y_sampled)
clf.fit(x_resampled, y_resampled)