aerdem4 / lofo-importance

Leave One Feature Out Importance
MIT License
810 stars 83 forks source link

Add the choice between Mean/Std and Median/IQR #32

Closed glevv closed 4 years ago

glevv commented 4 years ago

Median and IQR could be more robust and useful if distribution of importances is not normal.

Something like this

importance_df["importance_md"] = lofo_cv_scores_normalized.median(axis=1)
importance_df["importance_iqr"] = stats.iqr(lofo_cv_scores_normalized, axis=1)

Also for plot_importance there could be a choice between error and 95%CI;

For std it would be

importance_df.plot(x="feature", 
y="importance_mean", 
xerr=1.96 * importance_df.importance_std,
kind='barh', 
color=importance_df["color"], 
figsize=figsize)

and for iqr

importance_df.plot(x="feature", 
y="importance_md", 
xerr=1.57 * importance_df.importance_iqr / np.sqrt(n), # num_sampling for flofo and num of folds for lofo
kind='barh', 
color=importance_df["color"], 
figsize=figsize)
aerdem4 commented 4 years ago

Nice suggestion @GLevV Maybe a more generic solution is to return individual fold scores in lofo output and to add a boxplot option like:

df.set_index("feature").T.boxplot(column=features, vert=False)

What do you think?

glevv commented 4 years ago

Yes, that's great. Just tried it with FLOFO, works fine. Only 2 concerns:

aerdem4 commented 4 years ago

Since it will be optional, it won't be a big deal. Default plot can stay as current and with a parameter, people can select boxplot.

glevv commented 4 years ago

I guess it is closed then

aerdem4 commented 4 years ago

33 brings the box plot feature.