Closed nfelnlp closed 3 years ago
That should be the batch dimension, which should be set in the configs.
The two MNLI input strings are typically concatenated by the tokenizer, using a SEP separator token.
The tensor that is passed to the explainer should be the one returned in data.py
where such things are handled, i.e. the concatenation.
Note that some models do not have the batch dim at position 0. In these cases a workaround needs to be devised.
Ah, of course, thanks. Totally forgot about the batch dimension. So for LIME, the batch dimension should always be one? I will write a separate assert thingy for that then. What do you suggest for the workaround you mentioned?
Batch size and internal_batch_size
has to be exactly 1 in order to run explainer jobs with LIME.
The assertion in the
token_similarity_kernel
function ofExplainerLimeBase
https://github.com/nfelnlp/thermostat/blob/24177342945e834552a6df956ae59fdf1e69335b/src/thermostat/explainers/lime.py#L47
only works for IMDB so far. An error is thrown for MNLI, so I debugged it and found out that the two input shapes can still be equal, although they're not exactly 1. I assume this is because it has two text fields ("premise", "hypothesis") instead of one. The calculation below can still be performed with
.shape[0]==2
.Do you think removing the
== 1
at the end of the assertion would be fine?