Ability to extract attention weights from various heads of transformer
Motivation
Plotting attention provides insights into the inner-workings and user behaviors that business teams can relate with. This is easily available with pytorch / Tensorflow.
Is there a way to convert the trained model to PT / TF models to capture the attention values?
🚀 Feature request
Ability to extract attention weights from various heads of transformer
Motivation
Plotting attention provides insights into the inner-workings and user behaviors that business teams can relate with. This is easily available with pytorch / Tensorflow.
Is there a way to convert the trained model to PT / TF models to capture the attention values?