Erratic behaviour - Githubissues

Hi,

I am still running a series of experiments with shap-hypertune. Some sort of cross-validation with a number of stratified K-fold splits.

For each split, I generate random seeds like this: np.random.randint(4294967295).

A typical run goes like this (there is one for each split):

11 trials detected for ('num_leaves', 'n_estimators', 'max_depth', 'learning_rate')

trial: 0001 ### iterations: 00008 ### eval_score: 0.94737
trial: 0002 ### iterations: 00018 ### eval_score: 0.92481
trial: 0003 ### iterations: 00020 ### eval_score: 0.99248
trial: 0004 ### iterations: 00017 ### eval_score: 0.97744
trial: 0005 ### iterations: 00025 ### eval_score: 0.98496
trial: 0006 ### iterations: 00012 ### eval_score: 0.97744
trial: 0007 ### iterations: 00020 ### eval_score: 0.99248
trial: 0008 ### iterations: 00012 ### eval_score: 0.98496
trial: 0009 ### iterations: 00021 ### eval_score: 0.98496
trial: 0010 ### iterations: 00018 ### eval_score: 0.98496
trial: 0011 ### iterations: 00025 ### eval_score: 0.98496

11 trials detected for ('num_leaves', 'n_estimators', 'max_depth', 'learning_rate')

trial: 0001 ### iterations: 00025 ### eval_score: 0.96241
trial: 0002 ### iterations: 00038 ### eval_score: 0.97744
trial: 0003 ### iterations: 00037 ### eval_score: 0.97744
trial: 0004 ### iterations: 00015 ### eval_score: 0.96241
trial: 0005 ### iterations: 00002 ### eval_score: 0.81203
trial: 0006 ### iterations: 00018 ### eval_score: 0.96241
trial: 0007 ### iterations: 00016 ### eval_score: 0.96241
trial: 0008 ### iterations: 00011 ### eval_score: 0.91729
trial: 0009 ### iterations: 00038 ### eval_score: 0.97744
trial: 0010 ### iterations: 00022 ### eval_score: 0.96241
trial: 0011 ### iterations: 00021 ### eval_score: 0.96992

However, sometimes the eval_score drops dramatically.

But this does not seem to be your typical stochastic behaviour.

For instance, normally, f it drops for one split it will drop for all the subsequent splits. In spite of the fact that a new seed is (pseudo) randomly generated for each split at each stage:

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=np.random.randint(4294967295))

    clf_lgbm = LGBMClassifier(boosting_type='rf',
                         random_state=np.random.randint(4294967295),

    model = BoostRFA(    
    sampling_seed=np.random.randint(4294967295),

In other cases the number of iterations stays constant for each run:

11 trials detected for ('num_leaves', 'n_estimators', 'max_depth', 'learning_rate')

trial: 0001 ### iterations: 00001 ### eval_score: 0.69173
trial: 0002 ### iterations: 00001 ### eval_score: 0.7594
trial: 0003 ### iterations: 00001 ### eval_score: 0.69173
trial: 0004 ### iterations: 00001 ### eval_score: 0.69173
trial: 0005 ### iterations: 00001 ### eval_score: 0.79699
trial: 0006 ### iterations: 00001 ### eval_score: 0.69173
trial: 0007 ### iterations: 00001 ### eval_score: 0.69173
trial: 0008 ### iterations: 00001 ### eval_score: 0.7594
trial: 0009 ### iterations: 00001 ### eval_score: 0.69173
trial: 0010 ### iterations: 00001 ### eval_score: 0.69173
trial: 0011 ### iterations: 00001 ### eval_score: 0.69173

11 trials detected for ('num_leaves', 'n_estimators', 'max_depth', 'learning_rate')

trial: 0001 ### iterations: 00001 ### eval_score: 0.82707
trial: 0002 ### iterations: 00001 ### eval_score: 0.82707
trial: 0003 ### iterations: 00001 ### eval_score: 0.82707
trial: 0004 ### iterations: 00001 ### eval_score: 0.82707
trial: 0005 ### iterations: 00001 ### eval_score: 0.81955
trial: 0006 ### iterations: 00001 ### eval_score: 0.82707
trial: 0007 ### iterations: 00001 ### eval_score: 0.81955
trial: 0008 ### iterations: 00001 ### eval_score: 0.81955
trial: 0009 ### iterations: 00001 ### eval_score: 0.82707
trial: 0010 ### iterations: 00001 ### eval_score: 0.82707
trial: 0011 ### iterations: 00001 ### eval_score: 0.82707

If you re-run the script, you typically observe the normal behaviour again.

cerlymarco / shap-hypetune

Erratic behaviour #16