allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.76k stars 2.25k forks source link

Saliency Interpretation: Why Grad * Embedding? #3539

Closed experiencor closed 4 years ago

experiencor commented 4 years ago

https://github.com/allenai/allennlp/blob/b85c86cff6f0995002dca6216ba2e3aefe403d11/allennlp/interpret/saliency_interpreters/simple_gradient.py#L39

I don't find any explanation for this line in the paper "AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models."

What is the justification for the multiplication of Grad and Embedding instead of raw Grad?

Thank you.

matt-gardner commented 4 years ago

@Eric-Wallace probably has the best answer here. I know it's what he did in previous papers, but I don't remember why.

Eric-Wallace commented 4 years ago

This paper has an explanation in Section 2 https://arxiv.org/abs/1804.07781.

Basically, one definition of the "importance" of a word is the change in probability when that word is removed. So, when you do gradient * embedding, it simulates what would happen if I set the embedding to the all zero vector. This isn't quite removing the word, but hopefully its a close approximation to it.