NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
https://nvidia-merlin.github.io/Transformers4Rec/main
Apache License 2.0
1.07k stars 142 forks source link

[FEA] Feature to extract attention values from transformer heads #745

Open vivpra89 opened 10 months ago

vivpra89 commented 10 months ago

🚀 Feature request

Ability to extract attention weights from various heads of transformer

Motivation

Plotting attention provides insights into the inner-workings and user behaviors that business teams can relate with. This is easily available with pytorch / Tensorflow.

Is there a way to convert the trained model to PT / TF models to capture the attention values?

image