Closed glevv closed 4 years ago
Nice suggestion @GLevV Maybe a more generic solution is to return individual fold scores in lofo output and to add a boxplot option like:
df.set_index("feature").T.boxplot(column=features, vert=False)
What do you think?
Yes, that's great. Just tried it with FLOFO, works fine. Only 2 concerns:
with very small variance in feature importance boxes will be very small, basically lines (see screenshot)
it kinda works bad with small number of folds like 2
x = np.random.random(2)
plt.boxplot(x);
it will always produce visually the same plot just with different values. But I guess you can just add warning in initialization
if (isinstance(cv, int) and cv < 3) or (hasattr(cv, 'n_splits')) and cv.n_splits < 3):
warning_str = ("Warning: Small number of folds could lead to inadequate results")
warnings.warn(warning_str)
Since it will be optional, it won't be a big deal. Default plot can stay as current and with a parameter, people can select boxplot.
I guess it is closed then
Median and IQR could be more robust and useful if distribution of importances is not normal.
Something like this
Also for plot_importance there could be a choice between error and 95%CI;
For std it would be
and for iqr