Closed GS756 closed 1 year ago
Have traced the issue to https://git.science.uu.nl/m.j.robeer/text_explainability/-/blob/main/text_explainability/global_explanation/__init__.py#L136 where
[(w, counts[v]) for w, v in cv.vocabulary_.items() if k not in filter_words]
should be
[(token, counts[v]) for token, v in cv.vocabulary_.items() if token not in filter_words]
will fix it tonight, add proper testing in text_explainability==0.6.6
and ensure the Explabox requires the updated version.
Summary of bug
When using
box.explain.token_frequency
, thefilter_words
option does not work well: even if we specify a list of stop_words, such words still appear at the top of the ranking.Environment information
explabox
version: ?Reproducing the bug
Steps to reproduce the behavior:
df_train
anddf_test
, withtext
andlabel
columns,pipe
(from sklearn pipeline),labels_dict = {0: name_class_0, ...}
input_text = "your text here about class 0 but not about class 3"
Solutions Attempted
I tried with different lists but it was never filtered.