Text tutorials break if the sentence ends with a full stop "."

elboyran commented 5 months ago

The LIME text tutorial breaks if the sentence ends with a full stop ".".

Probably, also the RISE text tutorial.

elboyran commented 5 months ago

Please, mind the colormap change coming in #769.

SarahAlidoost commented 5 months ago

@elboyran can you post the error that you get in this issue? Is this error related to dianna.explain_text or visualization.highlight_text? It seems that visualization works with no problem. I added a test for visualization to check "dot" in text, see here. Also, you can run the test as:

from dianna.visualization.text import highlight_text
explanation = [
    ("Hello", 0, 0.5), ("world", 1, -0.5), (".", 2, 0.0),
    ("This", 3, 0.5), ("is", 4, -0.5), ("a", 5, 0.0),
    ("test", 6, 0.5), (".", 7, -0.5),
    ("Another", 8, 0.0), ("test", 9, 0.5), (".", 10, -0.5),
    ]
fig, ax = highlight_text(explanation=explanation)

It seems that the dianna.explain_text will break if there is a "dot" due to the tokenizer used in the notebook. I amnot sure if the tokenizer can handle dots, see more info here.

elboyran commented 5 months ago

@SarahAlidoost , indeed it breaks already at the relevance computation (dianna.explain_text).

I will put in the next comment the lengthy error message. As for the tokenizer supporting full stops, it'll be very strange if it doesn't. @loostrum do you know if that is the case? From the link you posted, I see an example with an exclamation point ('!'), would be weird if it supported only some punctuation characters.

elboyran commented 5 months ago

Error at cell N9 from the LIME text tutorial if in cell 8 one gives:

review = "A delectable and intriguing thriller filled with surprises." with a full stop at the end.

Cell 9:

# We're getting the explanation for the 'positive' class only,
# but dianna supports explaining for multiple labels in one go.
# It therefore always outputs a list of saliency maps. We want
# the first and only saliency map from this list here.
explanation_relevance = dianna.explain_text(model_runner, review, model_runner.tokenizer,
                                            'LIME', labels=[labels.index('positive')])[0]
explanation_relevance

ValueError Traceback (most recent call last) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5000,) + inhomogeneous part.

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last) Cell In[9], line 5 1 # We're getting the explanation for the 'positive' class only, 2 # but dianna supports explaining for multiple labels in one go. 3 # It therefore always outputs a list of saliency maps. We want 4 # the first and only saliency map from this list here. ----> 5 explanation_relevance = dianna.explain_text(model_runner, review, model_runner.tokenizer, 6 'LIME', labels=[labels.index('positive')])[0] 7 explanation_relevance

File ~/.local/lib/python3.10/site-packages/dianna/init.py:122, in explain_text(model_or_function, input_text, tokenizer, method, labels, kwargs) 120 if kwargs: 121 raise TypeError(f'Error due to following unused kwargs: {kwargs}') --> 122 return explainer.explain( 123 model_or_function=model_or_function, 124 input_text=input_text, 125 labels=labels, 126 tokenizer=tokenizer, 127 explain_text_kwargs, 128 )

File ~/.local/lib/python3.10/site-packages/dianna/methods/lime_text.py:91, in LIMEText.explain(self, model_or_function, input_text, labels, tokenizer, top_labels, num_features, num_samples, kwargs) 88 runner = utils.get_function(model_or_function, preprocess_function=self.preprocess_function) 89 explain_instance_kwargs = utils.get_kwargs_applicable_to_function( 90 self.explainer.explain_instance, kwargs) ---> 91 explanation = self.explainer.explain_instance(input_text, 92 runner, 93 labels=labels, 94 top_labels=top_labels, 95 num_features=num_features, 96 num_samples=num_samples, 97 explain_instance_kwargs 98 ) 100 local_explanations = explanation.local_exp 101 string_map = explanation.domain_mapper.indexed_string

File ~/.local/lib/python3.10/site-packages/lime/lime_text.py:413, in LimeTextExplainer.explain_instance(self, text_instance, classifier_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor) 406 indexed_string = (IndexedCharacters( 407 text_instance, bow=self.bow, mask_string=self.mask_string) 408 if self.char_level else 409 IndexedString(text_instance, bow=self.bow, 410 split_expression=self.split_expression, 411 mask_string=self.mask_string)) 412 domain_mapper = TextDomainMapper(indexed_string) --> 413 data, yss, distances = self.__data_labels_distances( 414 indexed_string, classifier_fn, num_samples, 415 distance_metric=distance_metric) 416 if self.class_names is None: 417 self.class_names = [str(x) for x in range(yss[0].shape[0])]

File ~/.local/lib/python3.10/site-packages/lime/lime_text.py:482, in LimeTextExplainer.__data_labels_distances(self, indexed_string, classifier_fn, num_samples, distance_metric) 480 data[i, inactive] = 0 481 inverse_data.append(indexed_string.inverse_removing(inactive)) --> 482 labels = classifier_fn(inverse_data) 483 distances = distance_fn(sp.sparse.csr_matrix(data)) 484 return data, labels, distances

Cell In[5], line 27, in MovieReviewsModelRunner.call(self, sentences) 24 tokenized_sentences.append(tokens_numerical) 26 # run the model, applying a sigmoid because the model outputs logits ---> 27 logits = self.run_model(tokenized_sentences) 28 pred = np.apply_along_axis(sigmoid, 1, logits) 30 # output two classes

File ~/.local/lib/python3.10/site-packages/dianna/utils/onnx_runner.py:33, in SimpleModelRunner.call(self, input_data) 30 input_data = self.preprocess_function(input_data) 32 onnx_input = {input_name: input_data} ---> 33 pred_onnx = sess.run([output_name], onnx_input)[0] 34 return pred_onnx

File ~/.local/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220, in Session.run(self, output_names, input_feed, run_options) 218 output_names = [output.name for output in self._outputs_meta] 219 try: --> 220 return self._sess.run(output_names, input_feed, run_options) 221 except C.EPFail as err: 222 if self._enable_fallback:

RuntimeError: Could not create tensor from given input list

elboyran commented 5 months ago

Ah, that issue has been filed already, sorry. Please, see #751.

dianna-ai / dianna

Text tutorials break if the sentence ends with a full stop "." #771