[InputXGradient] RuntimeError: One of the differentiated Tensors does not require grad

nfelnlp commented 3 years ago

Stuck at this error while implementing InputXGradient. Tested on DistilBERT and RoBERTa.

Traceback (most recent call last):
  File "/home/nfel/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "/home/nfel/PycharmProjects/thermostat/venv/lib/python3.8/site-packages/captum/log/__init__.py", line 35, in wrapper
    return func(*args, **kwargs)
  File "/home/nfel/PycharmProjects/thermostat/venv/lib/python3.8/site-packages/captum/attr/_core/input_x_gradient.py", line 117, in attribute
    gradients = self.gradient_func(
  File "/home/nfel/PycharmProjects/thermostat/venv/lib/python3.8/site-packages/captum/_utils/gradient.py", line 125, in compute_gradients
    grads = torch.autograd.grad(torch.unbind(outputs), inputs)
  File "/home/nfel/PycharmProjects/thermostat/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 223, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
  0%|                                                  | 0/1821 [07:25<?, ?it/s]
Traceback (most recent call last):
  File "/home/nfel/PycharmProjects/thermostat/venv/lib/python3.8/site-packages/captum/_utils/gradient.py", line 125, in compute_gradients
    grads = torch.autograd.grad(torch.unbind(outputs), inputs)
  File "/home/nfel/PycharmProjects/thermostat/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 223, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad

nfelnlp commented 3 years ago

Came up with a way to at least make the attribute function work for GuidedBackprop (and would work for Input x Gradient as well): https://github.com/nfelnlp/thermostat/blob/a8d74650f8ba4f4b0f6504cf8c06043e492d6ed3/src/thermostat/explainers/grad.py#L115

However, this meant:

creating a new model and explainer from scratch (due to the way forward functions are handled)
wrapping the model with a different forward function - following https://github.com/copenlu/ALPS_2021/blob/2d1a500be8affaf874da688ebcbb544f66ecb5e4/tutorial_src/model_builders.py#L162
shoving the inputs through the model's embeddings layer using the inputs_embeds parameter instead of the input_ids parameter

This results in an attributions shape of (1, 512, 768) which probably doesn't make sense at all.

nfelnlp commented 3 years ago

After consulting with @rbtsbg, InputXGradient and GuidedBackprop will be substituted by LayerGradientXActivation which works similar to LayerIntegratedGradients and is easier to handle with just passing the base model's embedding layer to the explainer at creation time.

DFKI-NLP / thermostat

[InputXGradient] RuntimeError: One of the differentiated Tensors does not require grad #5