TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.75k stars 331 forks source link

sklearn transform_feature_names with ColumnTransformer of Pipelines #382

Open bmreiniger opened 4 years ago

bmreiniger commented 4 years ago

From https://stackoverflow.com/q/60949339/10495893

If an sklearn ColumnTransformer has a Pipeline as one of its transformers, then transform_feature_names fails.

I outlined one possible solution in an answer to the SO post. I'm not too happy with it since it basically copies and edits ColumnTransformer.get_feature_names and makes that the dispatch for transform_feature_names. I'm also not sure how to deal with conflicting in_names and _df_columns. An alternative, or suggested modifications?

jnothman commented 4 years ago

I'd give preference to _df_columns when available and in_names is None. Your solution is reasonable, though it uses a lot of private sklearn API. In Scikit-learn 0.23 you can use the n_featuresin attribute instead of the private equivalent, but the other private attributes there don't really have public equivalents.

I think it would be a good idea to contribute something like this to eli5... though we are working, very slowly, on trying to fix these limitations in Scikit-learn. It turns out to be a hard problem to solve.