Closed cwenner closed 3 years ago
@cwenner this is really great thanks very much for contributing, I hadn't even noticed this and I imagine it could be quite annoying when there was a device mismatch. I'll give this a proper review in the next day or two but from what I've seen it looks good to go, really appreciate that you have done some testing too !
Thanks again for contributing to Transformers Interpret !
@cdpierse - Glad to hear! It's a tiny change but makes it more plug&play for my workflows.
Incidentally - is there already support for text-generation (~LM head) introspection?
@cwenner thanks again for this fix, I've bumped the latest version of the package with a patch to 0.5.2 so that you can use this for your workflows.
With regard to text generation it's tricky to solve from both a technical perspective and a design one. The reason being that LM heads usually work in recursive manner wherein they consume an input sequence and spit out a new token and then repeat for X number of steps. Because of how the explainer works calculating attributions for even one output can be pretty slow but to do it for the entirety of the output for 20-100 output tokens would take VERY long. On top of that is the issue of presenting the results I would have to display a huge table containing each step in the output sequence and my guess is it would look quite confusing.
I do think it's possible to do it, I'm just not sure of how useful it would be. I might be worth experimenting with but I don't know if it'll ever see the light of day here.
PR Description
This PR prevents Explainer initialization from changing the model device. This fixes some simple and common use cases which otherwise produce errors. The Explainer can work with the
model.device
property instead.Motivation and Context
Instead of guessing which device to use, stick to the model's device property. This property has been available since transformers 2.7.0 (which is older than the current required version). Without it, the user is forced to run the models on GPU (if available, not always preferred) and some common use cases produce errors.
Here is an example of a simple case that previously produces an error - as the model's device is changed by the Explainer:
Yielding:
Note: Perhaps
explainer.device
should be dropped entirely in favor of usingmodel.device
dynamically.Tests and Coverage
Existing asserts have been updated to check that explainers use the models' devices. (Incidentally, these tests were also failing in a fresh checkout - device.type "cuda" instead of "cuda:0")
Two new tests have been added which initialize explainers on cpu and cuda respectively.
Types of changes
Final Checklist: