TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.74k stars 332 forks source link

fast eli5.sklearn.permutation_importance? #336

Open omarcr opened 4 years ago

omarcr commented 4 years ago

Is there a way to make:

perm = PermutationImportance(estimator, cv='prefit', n_iter=1).fit(X_window_test, Y_test) fast?

currently I am running an experiment with 3,179 features and the algorithm is too slow (even with cv=prefit) is there a way to make it faster?

lkugler commented 4 years ago

@joelrich started an issue (#317) like that but it seemingly received no feedback. I would also vote for a parallel implementation. How would we implement it to run in parallel? joblib.Parallel?

jnothman commented 4 years ago

The new implementation of permutation importance in scikit-learn (not yet released) offers some parallelism: https://scikit-learn.org/dev/modules/generated/sklearn.inspection.permutation_importance.html https://scikit-learn.org/dev/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance

omarcr commented 4 years ago

I think @jnothman reference is the best that we currently have. Does anyone know if this will be ported to Eli? thanks,

folterj commented 2 years ago

It seems even for relatively small training sets, model (e.g. DecisionTreeClassifier, RandomForestClassifier) training is fast, but using permutation_importance on the trained models is incredibly slow. (Currently using model.featureimportances as alternative)