cdpierse / transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Apache License 2.0
1.27k stars 96 forks source link

How are attributions calculated? #55

Closed szamani20 closed 3 years ago

szamani20 commented 3 years ago

Thank you for your amazing work. The documentation for this project appears to be limited to code usage. I couldn't find much explanation for the actual method used to explain the model. Explicitly, some comments for the _calculate_attributions() method would be helpful to give an idea on how attributions are calculated. Thanks!

koren-v commented 3 years ago

@szamani20 Hi, actually the attributions are calculated firstly applying the Integrated Gradients (https://arxiv.org/abs/1703.01365) with respect to the model's embeddings and then (as IG will give as the matrix of shape [batch_size, seq_length, hidden_dim]) we sum up attributions by 1 axis (to get single scalar for each token) and normalize them.

cdpierse commented 3 years ago

Thanks for helping out @koren-v , the original paper is a very good to help understand how the algorithm works, the rest like @koren-v mentioned is just summing and normalizing to produce single scalars.