AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 273 forks source link

cross_val_predict_proba is redundant with sklearn.model_selection.cross_val_predict #25

Closed amueller closed 6 years ago

amueller commented 6 years ago

you can do method="predict_proba"

AxeldeRomblay commented 6 years ago

@amueller : Thank you ! Do you handle the case where you have to predict the proba for an unknown class (that occurs when your target contains only one sample from a given class, otherwise it's ok if you stratify...) ?

AxeldeRomblay commented 6 years ago

Also, I thought that cross_val_score used to crash (or still crashes ?) if a given class contains less samples than n_splits. Please tell me if I am wrong...

amueller commented 6 years ago

I think it crashes if there's a training set that doesn't contain all the classes if you use KFold. If you use StratifiedKFold it'll refuse to split because the stratification strategy doesn't work then. I guess we could warn instead and do a "best effort" stratification. That is a separate issue from the predict_proba shapes not matching, though.

Sent from phone. Please excuse spelling and brevity.

On Jul 28, 2017 18:56, "Axel" notifications@github.com wrote:

Also, I thought that cross_val_score used to crash (or still crashes ?) if a given class contains less samples than n_splits. Please tell me if I am wrong...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AxeldeRomblay/MLBox/issues/25#issuecomment-318780309, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFti59kR4STxxcbaTB19EWcJ2Grrzks5sSmcegaJpZM4Om9pI .

AxeldeRomblay commented 6 years ago

Ok ! I will have a try then. Thank you ! Also please feel free to open an issue for any suggestions ;)

AxeldeRomblay commented 6 years ago

I will modify it for drift_estimator.py, but for stacking_classifier.py I really need to handle the case where the target contains only one sample from a given class. I can't afford to delete this sample and then call cross_val_predict(method="predict_proba") due to shape compatibility issues...

amueller commented 6 years ago

If I understand your problem correctly, this will be fixed in https://github.com/scikit-learn/scikit-learn/pull/9585. Feel free to wait for that ;)

amueller commented 6 years ago

Sorry I meant https://github.com/scikit-learn/scikit-learn/pull/9532