AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Error while computing the cross validation mean score. #78

Closed YAOLI0407 closed 5 years ago

YAOLI0407 commented 5 years ago

Hi,

I am interested in MLBox and tried for a Kaggle classification project. When processing to the step of optimizing the best hyperparameters, an error message showed as 'An error occurred while computing the cross validation mean score. Check the parameter values and your scoring function.'

Here's the code I used:

` Path = ['train_path', 'test_path'] target = 'target_name'

rd = Reader(sep = ",") df = rd.train_test_split(paths, target_name)

dft = Drift_thresholder() df = dft.fit_transform(df)

space = {'ne__numerical_strategy':{"search":"choice", "space":['mean','median']},

     'ne__categorical_strategy':{"search":"choice",
                                 "space":[np.NaN]},

     'ce__strategy':{"search":"choice",
                     "space":['label_encoding','entity_embedding','random_projection']},

    'est__strategy':{"search":"choice",
                              "space":["LightGBM"]},    
    'est__n_estimators':{"search":"choice",
                              "space":[150]},    
    'est__colsample_bytree':{"search":"uniform",
                              "space":[0.8,0.95]},
    'est__subsample':{"search":"uniform",
                              "space":[0.8,0.95]},
    'est__max_depth':{"search":"choice",
                              "space":[5,6,7,8,9]},
    'est__learning_rate':{"search":"choice",
                              "space":[0.07]} 

    }

opt = Optimiser(scoring = "roc_auc", n_folds = 5) best_params = opt.optimise(space, df, 15)

` Can you help me with fixing it? Thanks for that!

AxeldeRomblay commented 5 years ago

Hello @Mia-3765,

Is it a binary classification problem ? if so, it seems that there is an issue (i am currently fixing it for the next release). Meanwhile, you can try to replace your scoring with the roc_auc from sklearn :

from sklearn.metrics import roc_auc_score opt = Optimiser(scoring = roc_auc_score, n_folds = 5)

Is it working for you ?

AxeldeRomblay commented 5 years ago

Hello @Mia-3765, MLBox 0.8.1 has just been released and the issue is now fixed. You can upgrade the package from PyPI :)

YAOLI0407 commented 5 years ago

Hi, Axel

Thanks for your reply. Yes, it is a classification problem. I have tried both methods you provided to me, importing sklearn and update MLBox version. However, the problem is not solved. The roc-auc score is still '-inf'.

[image: image.png]

Looking forward to hearing from you. Thanks.

On Sun, Aug 25, 2019 at 5:40 PM Axel notifications@github.com wrote:

Closed #78 https://github.com/AxeldeRomblay/MLBox/issues/78.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AxeldeRomblay/MLBox/issues/78?email_source=notifications&email_token=AIPE3HEYJU2YSI3SBYAHHVLQGL35BA5CNFSM4IOZP6B2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOTHUHPPY#event-2582149055, or mute the thread https://github.com/notifications/unsubscribe-auth/AIPE3HDJ2TTX4L5X6IHUAZDQGL35BANCNFSM4IOZP6BQ .

-- Yao(Mia) Li M.S. in Data Science Columbia University

AxeldeRomblay commented 5 years ago

Hello @Mia-3765

I think I have it. You decided to choose the following parameter :'ne__categorical_strategy':{"search":"choice", "space":[np.NaN]} which is not possible because it replaces missing values for categorical features by np.NaN which is a float... Try instead : 'ne__categorical_strategy':{"search":"choice", "space":['<NULL>']}.

If this still doesn't work. Can you tell me if it is a binary or multi class problem ? I can't see your image... Also have you tried other metrics ? Like accuracy...

YAOLI0407 commented 5 years ago

Hi Axel,

Thanks for replying. It's working now! By the way, this is a binary classification problem with imbalanced label.

Have a great day.

On Tue, Aug 27, 2019 at 4:58 AM Axel notifications@github.com wrote:

Hello @Mia-3765 https://github.com/Mia-3765

I think I have it. You decided to choose the following parameter :'ne__categorical_strategy':{"search":"choice", "space":[np.NaN]} which is not possible because it replaces missing values for categorical features by np.NaN which is a float... Try instead : 'ne__categorical_strategy':{"search":"choice", "space":['']}.

If this still doesn't work. Can you tell me if it is a binary or multi class problem ? I can't see your image... Also have you tried other metrics ? Like accuracy...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AxeldeRomblay/MLBox/issues/78?email_source=notifications&email_token=AIPE3HDJTPFKQ2QNTF3OMNLQGTUEFA5CNFSM4IOZP6B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5HA4IA#issuecomment-525209120, or mute the thread https://github.com/notifications/unsubscribe-auth/AIPE3HBJ5IMP7M4DOPG373TQGTUEFANCNFSM4IOZP6BQ .

-- Yao(Mia) Li M.S. in Data Science Columbia University