Closed mirix closed 2 years ago
Hi, thanks for your feedback but I don't think that this should be considered as an erratic behavior.
First of all, each split is not related to each other and this should not assure that all the folds behave in the same manner (especially with real data or unbalanced data).
RFA is more unstable than RFE considering how it works (starting from zeros features and adding them recursively).
Finally, results are affected by the number of iterations and the possible callbacks.
If you have some empirical evidence about this behavior (i.e. a possible bug in the implementation) don't hesitate to let me know.
Hi,
I am still running a series of experiments with shap-hypertune. Some sort of cross-validation with a number of stratified K-fold splits.
For each split, I generate random seeds like this: np.random.randint(4294967295).
A typical run goes like this (there is one for each split):
However, sometimes the eval_score drops dramatically.
But this does not seem to be your typical stochastic behaviour.
For instance, normally, f it drops for one split it will drop for all the subsequent splits. In spite of the fact that a new seed is (pseudo) randomly generated for each split at each stage:
In other cases the number of iterations stays constant for each run:
If you re-run the script, you typically observe the normal behaviour again.