Closed amueller closed 6 years ago
@amueller : Thank you ! Do you handle the case where you have to predict the proba for an unknown class (that occurs when your target contains only one sample from a given class, otherwise it's ok if you stratify...) ?
Also, I thought that cross_val_score used to crash (or still crashes ?) if a given class contains less samples than n_splits. Please tell me if I am wrong...
I think it crashes if there's a training set that doesn't contain all the classes if you use KFold. If you use StratifiedKFold it'll refuse to split because the stratification strategy doesn't work then. I guess we could warn instead and do a "best effort" stratification. That is a separate issue from the predict_proba shapes not matching, though.
Sent from phone. Please excuse spelling and brevity.
On Jul 28, 2017 18:56, "Axel" notifications@github.com wrote:
Also, I thought that cross_val_score used to crash (or still crashes ?) if a given class contains less samples than n_splits. Please tell me if I am wrong...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AxeldeRomblay/MLBox/issues/25#issuecomment-318780309, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFti59kR4STxxcbaTB19EWcJ2Grrzks5sSmcegaJpZM4Om9pI .
Ok ! I will have a try then. Thank you ! Also please feel free to open an issue for any suggestions ;)
I will modify it for drift_estimator.py, but for stacking_classifier.py I really need to handle the case where the target contains only one sample from a given class. I can't afford to delete this sample and then call cross_val_predict(method="predict_proba")
due to shape compatibility issues...
If I understand your problem correctly, this will be fixed in https://github.com/scikit-learn/scikit-learn/pull/9585. Feel free to wait for that ;)
Sorry I meant https://github.com/scikit-learn/scikit-learn/pull/9532
you can do
method="predict_proba"