TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.75k stars 331 forks source link

`format_as_html(explain(prediction(...))` fails to show the highlighted text #361

Open tsela opened 4 years ago

tsela commented 4 years ago

Hi,

I'm trying to use eli5 to explain the results of a simple Scikit-Learn pipeline made of a TfIdfVectorizer and a LogisticRegressionCV. In particular, I'm trying to replicate the looks of the results of eli5.show_prediction() as shown in https://eli5.readthedocs.io/en/latest/tutorials/sklearn-text.html, but using format_as_html() and explain_prediction() directly, since I'm building a web app rather than working with Jupyter.

The problem I have is that whatever I'm trying, I only get a weight table as output, and the highlighted text is missing. Even when I set force_weights to False, it still only shows the weight table. I've inspected the output of format_as_html() and I can't find any trace of highlighted text, only the HTML for the table. So it's not a case of styling moving the highlighted text away, it's quite simply missing.

Even checking the source code doesn't help, and I feel like I'm missing something. Is there a reason why I can't get the highlighted text to show up?

lopuhin commented 4 years ago

@tsela I see, that sounds very reasonable. Does eli5.show_prediction() show the text explanations on your pipeline?

tsela commented 4 years ago

@lopuhin I just checked, and no, eli5.show_prediction() only shows the weight table as well. Any idea where that comes from? My pipeline is relative simple, with the only complication is that I load the model from a pickled file using joblib, and I use a custom tokeniser in the TfIdfVectoriser. Could one of these be the issue?

lopuhin commented 4 years ago

The issue is likely due to a custom tokenizer, here is the relevant code which checks the class I believe: https://github.com/TeamHG-Memex/eli5/blob/017c738f8dcf3e31346de49a390835ffafad3f1b/eli5/sklearn/text.py#L53-L70 and also https://github.com/TeamHG-Memex/eli5/blob/017c738f8dcf3e31346de49a390835ffafad3f1b/eli5/sklearn/_span_analyzers.py#L7

so one option it so define a get_doc_weighted_spans method on your vectorizer - sorry this part is not really documented, you'll have to check the source.

tsela commented 4 years ago

Thanks for your help! It looks indeed like the custom tokeniser is the problem. I'm kind of misusing the tokeniser to do all the text preprocessing (so that the pickled model is the only thing I have to send around for whoever will be working on the production frontend), so it's quite understandable that this could cause the problem.

I'll see if I can define a get_doc_weighted_spans() method. My tokeniser is lossy (on purpose), so that might be a challenge, but I'll try and see if it's possible.

Thanks for your help!

hohl commented 4 years ago

I'm having the same issue. Tried both eli5.show_prediction() and eli5.explain_weights together with eli5.format_as_html. Both show me the table, but no nicely formatted text with coloured overlays.

But I am not using a custom tokenizer. Instead, I even tried to use just a plain TfIdfVectorizer() with all parameters left to their default values and it still didn't work.

I then even tried whether it works with TextExplainer:

te = TextExplainer()
te.fit(samples[0], model.predict_proba)
te.show_prediction()

But then again only showed the table and no highlighted text. Any other ideas what I could try?

Eli5 version is 0.10.1. scikit-learn is 0.22.1. Tried to run on two different machines too: one running Ubuntu and one running macOS.

hohl commented 4 years ago

I now even tried to run one of the sample notebooks in this repo. I hoped that I am just using the library wrong, but that does not seem to be the case.

The output of the first block with explain_prediction (..., force_weights=False, ...) does also not show me the highlighted text, but just the weights table, even thought force_weights is set to False in that sample.

I also tried whether it changes anything when I downgrade ELI5 to 0.10 or 0.9.0, but both these versions delivered the same results as the 0.10.1 release.

hohl commented 4 years ago

Finally found some configuration that worked: Downgrading scikit-learn to 0.21.3 does finally output the texts. I guess there is some incompatibility of sckit-learn 0.22 and ELI5 0.10.1?

sobayed commented 4 years ago

Same issue here, the highlighted text does not show for show_prediction() with ELI5 0.10.1 and scikit-learn 0.22

Querela commented 4 years ago

I'm also on the latest versions, trying to get a transformer-based explanation, but just using a prediction method and not getting any highlighted text. According to this: https://eli5.readthedocs.io/en/latest/tutorials/black-box-text-classifiers.html

def predict_proba(docs):
    # here obviously with code ...
   pass

label_list = ["0", "1"]
doc = "My example sentence expressing a strong optionen etc."

# ---

te = TextExplainer(random_state=42)
te.fit(doc, predict_proba)
te.show_prediction(target_names=label_list)

Fishing through the source, it might be some check, like this: https://github.com/TeamHG-Memex/eli5/blob/4839d1927c4a68aeff051935d1d4d8a4fb69b46d/eli5/sklearn/text.py#L65 (suggested by @lopuhin ) The code above is for sklearn, but stuffing my te.doc_, te.vec_, te.explain_prediction().targets[0].feature_weights into _get_doc_weighted_spans fails on VectorizerMixin. This may be related but I can be wrong...

eloukas commented 4 years ago

Same problem here. Could not show highlighted text, either in Jupyter, neither in other envs. Solved it with what @hohl suggested.

bakarep commented 4 years ago

@lopuhin , Was this looked into by eli5 team to make sure eli5 is compatible with latest version of sklearn ? I also tried downgrading sklearn but I am getting other issues like: ModuleNotFoundError: No module named 'sklearn.feature_selection._univariate_selection'

Request to please help with permanent solution

datanizing commented 4 years ago

VectorizerMixin was renamed to _VectorizerMixin in scikit-learn 0.22. Changing that in the code text.py (two occurences) as @Querela mentioned makes it work again.

jonas-nothnagel commented 3 years ago

Hi everyone, is there a fix now? I downgraded scikit-learn to 0.21.3 and still are not able to see any highlighted text unfortunately.

Bougeant commented 3 years ago

Hi @jonas-nothnagel, it looks like @icfly2 has submitted a PR for this fix. I'm not sure what needs to happen for his work to be merged in.

lopuhin commented 3 years ago

@Bougeant could you please link which PR was that? Existing sklearn compatibility PRs were merged in https://github.com/eli5-org/eli5/pull/2 and released with v0.11 - so if still does not work with v0.11, then it's something else. And sorry for confusion with different repos - I still hope we can get back to this one.

jonas-nothnagel commented 3 years ago

Thank you! It is such an important feature, in my opinion, to be able to explain why our models do predictions that I almost wonder why it is not implemented in many more libraries. For now I hardcoded around the issue by extracting the top 5, top 5-10 and last 5, last 5-10 feature names and weights from the explain_prediction() function, match them with the original text and highlight the words with html markdown commands. It is a bit hacky but works as well. I could also add the weights. image

Bougeant commented 3 years ago

@lopuhin Sorry I did not actually check if it was still failing. Awesome that you guys fixed that.

lopuhin commented 3 years ago

right, there are multiple issues - I think original issue was about showing explanation for a custom pipeline, then when a new sklearn was released and we didn't support it, we were failing earlier - now we support latest sklearn, but the original issue of highlighting for a more custom pipeline remains.

jonas-nothnagel commented 3 years ago

I still do not see highlighted text, even using no pipelines and just simply specifying tfidf vectorizer and, for example, a logistic regression. Can you share under what circumstances you obtain the highlighted text?

cbjrobertson commented 1 year ago

Has any progress been made on this? I'm using sklearn 1.2.0 and eli5 0.13.0 in python 3.9 and running into this issue. Down grading sklearn no longer works, it just gives rise to a host of incompatibility errors.