Open ppleumyy opened 2 years ago
Could you share the dimensions of X_trainTF and y_train?
Could you share the dimensions of X_trainTF and y_train?
(4621, 2134) , (4621,)
@gykovacs
Interesting, which version of Python and numpy are you using? There might have been some changes in the latest versions which have not been checked yet. (up to P3.9 were the tests executed, I should cover the most recent versions soon)
@
Interesting, which version of Python and numpy are you using? There might have been some changes in the latest versions which have not been checked yet. (up to P3.9 were the tests executed, I should cover the most recent versions soon)
python version is 3.7.13 numpy version is 1.21.6
@gykovacs
Cool, this is not the case then, it should work with this setup. If it is not much of a burden, could you please prepare a minimal working example, like replacing the X_trainTF and y_train with some random arrays of the same size, feed them into the MulticlassOversampling and see if it fails? I could use that as a minimal working example for debugging.
Also, could you please share the label distribution in y_train? Are the labels of integer type?
Cool, this is not the case then, it should work with this setup. If it is not much of a burden, could you please prepare a minimal working example, like replacing the X_trainTF and y_train with some random arrays of the same size, feed them into the MulticlassOversampling and see if it fails? I could use that as a minimal working example for debugging.
Also, could you please share the label distribution in y_train? Are the labels of integer type?
this is my google colab workspace https://colab.research.google.com/drive/1ETmdFjWEJdayBq_Ji3Eu6qKprrc0lC_G?usp=sharing
and the dataset file: Suicidal_K1_Train.csv
@gykovacs
Perfect, I look into it!
Perfect, I look into it!
thank you very much!
@gykovacs
Hi @ppleumyy, so, all the smote_variants
tools operate on numerical arrays. Your y_train
contains strings, and it is a pandas Series, while your X_trainTF is a sparse array (it needs to be dense). So with the following changes, everything seems to work as expected:
y_train[y_train == 'Level 1'] = 1
y_train[y_train == 'Level 2'] = 2
y_train[y_train == 'Level 3'] = 3
y_train[y_train == 'Level 4'] = 4
y_train[y_train == 'Level 5'] = 5
y_train[y_train == 'Other'] = 0
y_train= y_train.values
X_trainTF= X_trainTF.todense()
This is my code:
and I get this error: