Open tsela opened 4 years ago
@tsela I see, that sounds very reasonable. Does eli5.show_prediction()
show the text explanations on your pipeline?
@lopuhin I just checked, and no, eli5.show_prediction()
only shows the weight table as well. Any idea where that comes from? My pipeline is relative simple, with the only complication is that I load the model from a pickled file using joblib, and I use a custom tokeniser in the TfIdfVectoriser. Could one of these be the issue?
The issue is likely due to a custom tokenizer, here is the relevant code which checks the class I believe: https://github.com/TeamHG-Memex/eli5/blob/017c738f8dcf3e31346de49a390835ffafad3f1b/eli5/sklearn/text.py#L53-L70 and also https://github.com/TeamHG-Memex/eli5/blob/017c738f8dcf3e31346de49a390835ffafad3f1b/eli5/sklearn/_span_analyzers.py#L7
so one option it so define a get_doc_weighted_spans
method on your vectorizer - sorry this part is not really documented, you'll have to check the source.
Thanks for your help! It looks indeed like the custom tokeniser is the problem. I'm kind of misusing the tokeniser to do all the text preprocessing (so that the pickled model is the only thing I have to send around for whoever will be working on the production frontend), so it's quite understandable that this could cause the problem.
I'll see if I can define a get_doc_weighted_spans()
method. My tokeniser is lossy (on purpose), so that might be a challenge, but I'll try and see if it's possible.
Thanks for your help!
I'm having the same issue. Tried both eli5.show_prediction()
and eli5.explain_weights
together with eli5.format_as_html
. Both show me the table, but no nicely formatted text with coloured overlays.
But I am not using a custom tokenizer. Instead, I even tried to use just a plain TfIdfVectorizer()
with all parameters left to their default values and it still didn't work.
I then even tried whether it works with TextExplainer
:
te = TextExplainer()
te.fit(samples[0], model.predict_proba)
te.show_prediction()
But then again only showed the table and no highlighted text. Any other ideas what I could try?
Eli5 version is 0.10.1. scikit-learn is 0.22.1. Tried to run on two different machines too: one running Ubuntu and one running macOS.
I now even tried to run one of the sample notebooks in this repo. I hoped that I am just using the library wrong, but that does not seem to be the case.
The output of the first block with explain_prediction (..., force_weights=False, ...)
does also not show me the highlighted text, but just the weights table, even thought force_weights
is set to False
in that sample.
I also tried whether it changes anything when I downgrade ELI5 to 0.10 or 0.9.0, but both these versions delivered the same results as the 0.10.1 release.
Finally found some configuration that worked: Downgrading scikit-learn to 0.21.3 does finally output the texts. I guess there is some incompatibility of sckit-learn 0.22 and ELI5 0.10.1?
Same issue here, the highlighted text does not show for show_prediction()
with ELI5 0.10.1 and scikit-learn 0.22
I'm also on the latest versions, trying to get a transformer-based explanation, but just using a prediction method and not getting any highlighted text. According to this: https://eli5.readthedocs.io/en/latest/tutorials/black-box-text-classifiers.html
def predict_proba(docs):
# here obviously with code ...
pass
label_list = ["0", "1"]
doc = "My example sentence expressing a strong optionen etc."
# ---
te = TextExplainer(random_state=42)
te.fit(doc, predict_proba)
te.show_prediction(target_names=label_list)
Fishing through the source, it might be some check, like this: https://github.com/TeamHG-Memex/eli5/blob/4839d1927c4a68aeff051935d1d4d8a4fb69b46d/eli5/sklearn/text.py#L65 (suggested by @lopuhin )
The code above is for sklearn
, but stuffing my te.doc_, te.vec_, te.explain_prediction().targets[0].feature_weights
into _get_doc_weighted_spans
fails on VectorizerMixin
. This may be related but I can be wrong...
Same problem here. Could not show highlighted text, either in Jupyter, neither in other envs. Solved it with what @hohl suggested.
@lopuhin , Was this looked into by eli5 team to make sure eli5 is compatible with latest version of sklearn ? I also tried downgrading sklearn but I am getting other issues like: ModuleNotFoundError: No module named 'sklearn.feature_selection._univariate_selection'
Request to please help with permanent solution
VectorizerMixin
was renamed to _VectorizerMixin
in scikit-learn 0.22. Changing that in the code text.py (two occurences) as @Querela mentioned makes it work again.
Hi everyone, is there a fix now? I downgraded scikit-learn to 0.21.3 and still are not able to see any highlighted text unfortunately.
Hi @jonas-nothnagel, it looks like @icfly2 has submitted a PR for this fix. I'm not sure what needs to happen for his work to be merged in.
@Bougeant could you please link which PR was that? Existing sklearn compatibility PRs were merged in https://github.com/eli5-org/eli5/pull/2 and released with v0.11 - so if still does not work with v0.11, then it's something else. And sorry for confusion with different repos - I still hope we can get back to this one.
Thank you! It is such an important feature, in my opinion, to be able to explain why our models do predictions that I almost wonder why it is not implemented in many more libraries.
For now I hardcoded around the issue by extracting the top 5, top 5-10 and last 5, last 5-10 feature names and weights from the explain_prediction()
function, match them with the original text and highlight the words with html markdown commands. It is a bit hacky but works as well. I could also add the weights.
@lopuhin Sorry I did not actually check if it was still failing. Awesome that you guys fixed that.
right, there are multiple issues - I think original issue was about showing explanation for a custom pipeline, then when a new sklearn was released and we didn't support it, we were failing earlier - now we support latest sklearn, but the original issue of highlighting for a more custom pipeline remains.
I still do not see highlighted text, even using no pipelines and just simply specifying tfidf vectorizer and, for example, a logistic regression. Can you share under what circumstances you obtain the highlighted text?
Has any progress been made on this? I'm using sklearn 1.2.0
and eli5 0.13.0
in python 3.9
and running into this issue. Down grading sklearn no longer works, it just gives rise to a host of incompatibility errors.
Hi,
I'm trying to use eli5 to explain the results of a simple Scikit-Learn pipeline made of a TfIdfVectorizer and a LogisticRegressionCV. In particular, I'm trying to replicate the looks of the results of
eli5.show_prediction()
as shown in https://eli5.readthedocs.io/en/latest/tutorials/sklearn-text.html, but usingformat_as_html()
andexplain_prediction()
directly, since I'm building a web app rather than working with Jupyter.The problem I have is that whatever I'm trying, I only get a weight table as output, and the highlighted text is missing. Even when I set
force_weights
to False, it still only shows the weight table. I've inspected the output offormat_as_html()
and I can't find any trace of highlighted text, only the HTML for the table. So it's not a case of styling moving the highlighted text away, it's quite simply missing.Even checking the source code doesn't help, and I feel like I'm missing something. Is there a reason why I can't get the highlighted text to show up?