cdpierse / transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Apache License 2.0
1.27k stars 96 forks source link

New feature: smooth gradient #35

Closed jpcorb20 closed 2 years ago

jpcorb20 commented 3 years ago

Hello,

is smooth gradient part of your future update plan ?

Thanks

cdpierse commented 3 years ago

Hi @jpcorb20 ,

So for the 1.0.0 release I definitely want to explore the possibility of adding a new attribution method, I know that smoothgrad is part of Captum's in built algorithms so it would definitely be something I would look at.

I mostly have experience using integrated gradients, layer integrated gradients and shapely with nlp type models so I'd need to do some research to get smoothgrad working with transformers models and with Captum as the explainability library.

Do you have any experience using smoothgrad with HF transformers models? If so it'd be great to chat as once I figure out how to do a minimal working example I can go about integrating that into the attributions section.

Thanks, Charles

jpcorb20 commented 3 years ago

Hello @cdpierse

Thanks for your quick reply!

I tested it out on some other library (e.g. allen-nlp interpret), and it seems to give interesting results in terms of interpretability. I also found that implementation of smooth grad. Yet, I think you have the best implementation with HF!

It seems like SmoothGrad works with "perturbations" by injecting some Gaussian noise (average = 0, std = very small 0.01) in the embeddings N times (e.g. 20-35), and average the resulting gradients to reduce the sharp variations that might be misleading.

I can probably test a minimal example around next week. I haven't work with Captum though.

Thanks,

JPC

koren-v commented 3 years ago

Hi, @jpcorb20 thanks for referencing my repo, I wanted to share my experience in comparison of IG vs Smooth Grad - in my opinion, the first one gives more meaningful outputs. Talking about the difference in the implementation - from the usage perspective they look very similar, thus believe that it would be not complicated to embed a new optional algorithm.

jpcorb20 commented 3 years ago

Hi @koren-v , I really appreciate that you share your experience and opinion, thanks! 🙂

I have been lacking the time to test a quick implementation of Smooth Gradient to be honest ...

What I had trouble with was the large values (e.g. which looks like one token with one color and the rest about neutral). So, I have been playing with the LIG token values a bit to see how I can gather more insights on the other tokens despite this issue. I came to either clip the values based on the distribution (2-3 std) or changing the color scale (square root or log), and now I can analyze the nuances of my model in most cases! I also boosted the values a bit for better contrast visually.

Maybe, it could be an update with a few parameters to set those corrections?

Thanks

koren-v commented 3 years ago

@jpcorb20 Do not sure if these parameters would be consistent with the library, as basically, you can perform such operations after receiving the token scores (if I understood what you meant). What I wonder about, it's the correctness of the summing up embeddings gradients, and think that it can impact the final distribution over the sentence.