aerdem4 / lofo-importance

Leave One Feature Out Importance

MIT License

810 stars 83 forks source link

Add the choice between Mean/Std and Median/IQR #32

Closed glevv closed 4 years ago

glevv commented 4 years ago

Median and IQR could be more robust and useful if distribution of importances is not normal.

Something like this

importance_df["importance_md"] = lofo_cv_scores_normalized.median(axis=1)
importance_df["importance_iqr"] = stats.iqr(lofo_cv_scores_normalized, axis=1)

Also for plot_importance there could be a choice between error and 95%CI;

For std it would be

importance_df.plot(x="feature", 
y="importance_mean", 
xerr=1.96 * importance_df.importance_std,
kind='barh', 
color=importance_df["color"], 
figsize=figsize)

and for iqr

importance_df.plot(x="feature", 
y="importance_md", 
xerr=1.57 * importance_df.importance_iqr / np.sqrt(n), # num_sampling for flofo and num of folds for lofo
kind='barh', 
color=importance_df["color"], 
figsize=figsize)

aerdem4 commented 4 years ago

Nice suggestion @GLevV Maybe a more generic solution is to return individual fold scores in lofo output and to add a boxplot option like:

df.set_index("feature").T.boxplot(column=features, vert=False)

What do you think?

glevv commented 4 years ago

Yes, that's great. Just tried it with FLOFO, works fine. Only 2 concerns:

with very small variance in feature importance boxes will be very small, basically lines (see screenshot)

it kinda works bad with small number of folds like 2

x = np.random.random(2)
plt.boxplot(x);

it will always produce visually the same plot just with different values. But I guess you can just add warning in initialization

if (isinstance(cv, int) and cv < 3) or (hasattr(cv, 'n_splits')) and cv.n_splits < 3):
warning_str = ("Warning: Small number of folds could lead to inadequate results")
warnings.warn(warning_str)

aerdem4 commented 4 years ago

Since it will be optional, it won't be a big deal. Default plot can stay as current and with a parameter, people can select boxplot.

glevv commented 4 years ago

I guess it is closed then

aerdem4 commented 4 years ago

33 brings the box plot feature.