cwenner commented 3 years ago

PR Description

This PR prevents Explainer initialization from changing the model device. This fixes some simple and common use cases which otherwise produce errors. The Explainer can work with the model.device property instead.

Motivation and Context

Instead of guessing which device to use, stick to the model's device property. This property has been available since transformers 2.7.0 (which is older than the current required version). Without it, the user is forced to run the models on GPU (if available, not always preferred) and some common use cases produce errors.

Here is an example of a simple case that previously produces an error - as the model's device is changed by the Explainer:

import transformers
import transformers_interpret
classifier = transformers.pipeline("zero-shot-classification")
explainer = transformers_interpret.ZeroShotClassificationExplainer(classifier.model, classifier.tokenizer)
print(explainer("bluebear", ["animal", "art", "fruit"]))
print(classifier("bluebear", ["animal", "art", "fruit"]))

Yielding:

   1851         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1852     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Input, output and indices must be on the current device

Note: Perhaps explainer.device should be dropped entirely in favor of using model.device dynamically.

Tests and Coverage

Existing asserts have been updated to check that explainers use the models' devices. (Incidentally, these tests were also failing in a fresh checkout - device.type "cuda" instead of "cuda:0")

Two new tests have been added which initialize explainers on cpu and cuda respectively.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Docs (Added to or improved Transformers Interpret's documentation)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Final Checklist:

[x] My code follows the code style of this project.
[x] I have updated the documentation accordingly.
[x] I have added tests to cover my changes.
[x] All new and existing tests passed.

cdpierse commented 3 years ago

@cwenner this is really great thanks very much for contributing, I hadn't even noticed this and I imagine it could be quite annoying when there was a device mismatch. I'll give this a proper review in the next day or two but from what I've seen it looks good to go, really appreciate that you have done some testing too !

Thanks again for contributing to Transformers Interpret !

cwenner commented 3 years ago

@cdpierse - Glad to hear! It's a tiny change but makes it more plug&play for my workflows.

Incidentally - is there already support for text-generation (~LM head) introspection?

cdpierse commented 3 years ago

@cwenner thanks again for this fix, I've bumped the latest version of the package with a patch to 0.5.2 so that you can use this for your workflows.

With regard to text generation it's tricky to solve from both a technical perspective and a design one. The reason being that LM heads usually work in recursive manner wherein they consume an input sequence and spit out a new token and then repeat for X number of steps. Because of how the explainer works calculating attributions for even one output can be pretty slow but to do it for the entirety of the output for 20-100 output tokens would take VERY long. On top of that is the issue of presenting the results I would have to display a huge table containing each step in the output sequence and my guess is it would look quite confusing.

I do think it's possible to do it, I'm just not sure of how useful it would be. I might be worth experimenting with but I don't know if it'll ever see the light of day here.

cdpierse / transformers-interpret

Use model.device instead of moving the model on initialization #57

PR Description

Motivation and Context

Tests and Coverage

Types of changes

Final Checklist: