cdpierse / transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Apache License 2.0
1.3k stars 97 forks source link

PairwiseSequenceClassificationExplainer, RoBERTa bug fixes, GH Actions migration #99

Closed cdpierse closed 2 years ago

cdpierse commented 2 years ago

PR Description

Lots of changes big and small with this release:

PairwiseSequenceClassificationExplainer (#87, #82, #58)

This has been a fairly requested feature and one that I am very happy to release, especially as I have had the desire to explain the outputs of CrossEncoder models as of late.

The PairwiseSequenceClassificationExplainer is a variant of the SequenceClassificationExplainer that is designed to work with classification models that expect the input sequence to be two inputs separated by a models' separator token. Common examples of this are NLI models and Cross-Encoders which are commonly used to score two inputs similarity to one another.

This explainer calculates pairwise attributions for two passed inputs text1 and text2 using the model and tokenizer given in the constructor.

Also, since a common use case for pairwise sequence classification is to compare two inputs similarity - models of this nature typically only have a single output node rather than multiple for each class. The pairwise sequence classification has some useful utility functions to make interpreting single node outputs clearer.

By default for models that output a single node the attributions are with respect to the inputs pushing the scores closer to 1.0, however if you want to see the attributions with respect to scores closer to 0.0 you can pass flip_sign=True when calling the explainer. For similarity-based models, this is useful, as the model might predict a score closer to 0.0 for the two inputs and in that case, we would flip the attributions sign to explain why the two inputs are dissimilar.

RoBERTa Consitency Improvements (#65)

Thanks to some great detective work by @dvsrepo, @jogonba2, @databill86, and @VDuchauffour on this issue over the last year we've been able to identify what looks to be the main culprit responsible for the misalignment of scores given for RoBERTa based model inside the package when compared with their actual outputs in the transformers package.

Because this package has to create reference id's for each input type (input_ids, position_ids, token_type_ids) to create a baseline we try and emulate the outputs of the model's tokenizers in an automated fashion, for most BERT-based models this works great but as I have learned from reading this thread (#65) there were significant issues with RoBERTa.

It seems that the main reason for this is that RoBERTa implements position_ids in a very different manner to BERT (read this and this for extra context). Since we were passing completely incorrect values for position_ids it appears to have thrown the model's predictions off. This release does not fully fix the issue but it does bypass the passing of incorrect position_ids by simply not passing them to the forward function. We've done this by creating a flag that recognises certain model architectures as being incompatible with how we create position_ids according the Transformers docs when position_ids are not passed:

They are an optional parameter. If no position_ids are passed to the model, the IDs are automatically created as absolute positional embeddings.

So this solution should be good for most situations, however, ideally in the future, we will look into creating RoBERTa compatible position_ids within the package itself.

Move to GH actions

This release also moves our testing suite from CircleCI to GH Actions, GH Actions has proven to be easier to integrate with and much more convenient.

Other

Tests and Coverage

Tests have been added for the new explainer, coverage is still >90%.

Types of changes

Final Checklist: