TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.76k stars 332 forks source link

Export tables not just to HTML #196

Closed jnothman closed 7 years ago

jnothman commented 7 years ago

It would be good if Explanation could be exported not just to HTML, but to Pandas dataframes or a similar tabular format. This would enable further slicing and dicing, alternative methods of highlighting through DataFrame.style, and the ability to export to other on-disk formats.

kmike commented 7 years ago

Explanation can be exported to Python dicts/lists (and thus json) - see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/formatters/as_dict.py. But exporting the result to pandas makes a lot of sense, I like the idea, and it is not the first time we're asked about it; +1 to have direct DataFrame support.

jnothman commented 7 years ago

It's easy to turn it in to many different JSONs once it's in DataFrame...

On 17 May 2017 at 18:39, Mikhail Korobov notifications@github.com wrote:

Explanation can be exported to Python dicts/lists (and thus json) - see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/ formatters/as_dict.py. But exporting the result to pandas makes a lot of sense, I like the idea, and it is not the first time we're asked about it; +1 to have direct DataFrame support.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TeamHG-Memex/eli5/issues/196#issuecomment-302024857, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz65jlBa2MxDSpiipbnRIidFesKKITks5r6rJJgaJpZM4NdVAw .

ClimbsRocks commented 7 years ago

This feature would be great! I'm going through and doing this manually when interacting with eli5, but it would be much better to have it baked into the library itself, so I'm not dependent on some of your internal API decisions.

lopuhin commented 7 years ago

This looks like a nice feature to have indeed.

There is no goal to preserve all information in this export, like number of remaining items not included in the export, or score/proba in case of explain prediction, right?

I think it makes sense to support export not only of the Explanation object, but also of TargetExplanation (to get explanation for one target).

Next, what would an ideal API look like? Is it ok to make features and index?

image

Does a MultiIndex make sense for multiple targets? Shall we leave it even in case of a single target?

image

Currently explain prediction looks the same:

image

ClimbsRocks commented 7 years ago

I always advocate for consistency within a project, even if that means a slightly-sub-optimal API for one particular part of it. So if explain_prediction is already doing this one way, I'd say do it the same way for explain_weights.

That said, I don't think of this much in terms of targets, so much as I do the features. So for my use cases, I'd probably structure it with features as keys, and information for each target as columns (target=alt.atheism_std, or just alt.atheism_std).

But again, that seems to slightly contradict how the project is already set up, and I think it's more important to ensure consistency.

Can't wait for this!

jnothman commented 7 years ago

Perhaps it's instructive to consider what this looks like for CRF

On 23 May 2017 7:13 am, "Preston Parry" notifications@github.com wrote:

I always advocate for consistency within a project, even if that means a slightly-sub-optimal API for one particular part of it. So if explain_prediction is already doing this one way, I'd say do it the same way for explain_weights.

That said, I don't think of this much in terms of targets, so much as I do the features. So for my use cases, I'd probably structure it with features as keys, and information for each target as columns ( target=alt.atheism_std, or just alt.atheism_std).

But again, that seems to slightly contradict how the project is already set up, and I think it's more important to ensure consistency.

Can't wait for this!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TeamHG-Memex/eli5/issues/196#issuecomment-303221745, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz672vz2oJLRW2heOfmKVMM3XfVBoEks5r8fp0gaJpZM4NdVAw .

lopuhin commented 7 years ago

Perhaps it's instructive to consider what this looks like for CRF

Right, I didn't realize it has both transition features and targets. Transition features can be represented as a pivot table, and we can support exporting explanation.transition_features directly, but I'm not sure what should be returned for the CRF explanation...

image

lopuhin commented 7 years ago

Unless someone suggests a better idea, I'll make a PR with the current implementation, adding support for export of parts of the explanation and docs. So export to pandas will be best-effort: it will not export all attributes, only the stuff that maps onto the dataframe well, and in case of CRF explanation only the transition features will be exported, but it will be possible to export target explanations directly too.

kmike commented 7 years ago

For me it looks like a single DataFrame is not flexible enough for all use cases. What do you think about adding format_as_dataframes which returns a dict (?) of DataFrame objects, and format_as_dataframe (may be implemented as a format_as_datframes wrapper), which returns a single DataFrame and shows a warning if some of the data can't be represented this way?

lopuhin commented 7 years ago

@kmike yes, I like this idea, it seems it solves all current issues! Thanks :)

kmike commented 7 years ago

Fixed by https://github.com/TeamHG-Memex/eli5/pull/211. There can be further improvements, but let's use separate tickets for them.