Hi there;
I hope you are well;
I have question, given a sequence as input in esm1b, how I can extract 660 attention map associated with each head in each layer?
Thanks so much
Hi Nasser, model.forward() has an argument need_head_weights, link.
The return dictionary will then contain result["attentions"] see L183.
Feel free to re-open if you have any issues!
Hi there; I hope you are well; I have question, given a sequence as input in esm1b, how I can extract 660 attention map associated with each head in each layer? Thanks so much