Closed jnothman closed 7 years ago
Explanation can be exported to Python dicts/lists (and thus json) - see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/formatters/as_dict.py. But exporting the result to pandas makes a lot of sense, I like the idea, and it is not the first time we're asked about it; +1 to have direct DataFrame support.
It's easy to turn it in to many different JSONs once it's in DataFrame...
On 17 May 2017 at 18:39, Mikhail Korobov notifications@github.com wrote:
Explanation can be exported to Python dicts/lists (and thus json) - see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/ formatters/as_dict.py. But exporting the result to pandas makes a lot of sense, I like the idea, and it is not the first time we're asked about it; +1 to have direct DataFrame support.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TeamHG-Memex/eli5/issues/196#issuecomment-302024857, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz65jlBa2MxDSpiipbnRIidFesKKITks5r6rJJgaJpZM4NdVAw .
This feature would be great! I'm going through and doing this manually when interacting with eli5, but it would be much better to have it baked into the library itself, so I'm not dependent on some of your internal API decisions.
This looks like a nice feature to have indeed.
There is no goal to preserve all information in this export, like number of remaining items not included in the export, or score/proba in case of explain prediction, right?
I think it makes sense to support export not only of the Explanation
object, but also of TargetExplanation
(to get explanation for one target).
Next, what would an ideal API look like? Is it ok to make features and index?
Does a MultiIndex make sense for multiple targets? Shall we leave it even in case of a single target?
Currently explain prediction looks the same:
I always advocate for consistency within a project, even if that means a slightly-sub-optimal API for one particular part of it. So if explain_prediction is already doing this one way, I'd say do it the same way for explain_weights.
That said, I don't think of this much in terms of targets, so much as I do the features. So for my use cases, I'd probably structure it with features as keys, and information for each target as columns (target=alt.atheism_std
, or just alt.atheism_std
).
But again, that seems to slightly contradict how the project is already set up, and I think it's more important to ensure consistency.
Can't wait for this!
Perhaps it's instructive to consider what this looks like for CRF
On 23 May 2017 7:13 am, "Preston Parry" notifications@github.com wrote:
I always advocate for consistency within a project, even if that means a slightly-sub-optimal API for one particular part of it. So if explain_prediction is already doing this one way, I'd say do it the same way for explain_weights.
That said, I don't think of this much in terms of targets, so much as I do the features. So for my use cases, I'd probably structure it with features as keys, and information for each target as columns ( target=alt.atheism_std, or just alt.atheism_std).
But again, that seems to slightly contradict how the project is already set up, and I think it's more important to ensure consistency.
Can't wait for this!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TeamHG-Memex/eli5/issues/196#issuecomment-303221745, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz672vz2oJLRW2heOfmKVMM3XfVBoEks5r8fp0gaJpZM4NdVAw .
Perhaps it's instructive to consider what this looks like for CRF
Right, I didn't realize it has both transition features and targets. Transition features can be represented as a pivot table, and we can support exporting explanation.transition_features
directly, but I'm not sure what should be returned for the CRF explanation...
Unless someone suggests a better idea, I'll make a PR with the current implementation, adding support for export of parts of the explanation and docs. So export to pandas will be best-effort: it will not export all attributes, only the stuff that maps onto the dataframe well, and in case of CRF explanation only the transition features will be exported, but it will be possible to export target explanations directly too.
For me it looks like a single DataFrame is not flexible enough for all use cases. What do you think about adding format_as_dataframes
which returns a dict (?) of DataFrame objects, and format_as_dataframe
(may be implemented as a format_as_datframes
wrapper), which returns a single DataFrame and shows a warning if some of the data can't be represented this way?
@kmike yes, I like this idea, it seems it solves all current issues! Thanks :)
Fixed by https://github.com/TeamHG-Memex/eli5/pull/211. There can be further improvements, but let's use separate tickets for them.
It would be good if Explanation could be exported not just to HTML, but to Pandas dataframes or a similar tabular format. This would enable further slicing and dicing, alternative methods of highlighting through DataFrame.style, and the ability to export to other on-disk formats.