Thank you for providing the code, it's a very interesting work!
But the description, that "The attention weights reflect the functional groups’ significance to the global characteristics of the molecule, extracted from the final self-attention layer and normalized. ", seems to be unclear. There are lots of normalization methods and attention score computing methods. Could you please provide the visualization code of Fig.4.?
Thank you for providing the code, it's a very interesting work! But the description, that "The attention weights reflect the functional groups’ significance to the global characteristics of the molecule, extracted from the final self-attention layer and normalized. ", seems to be unclear. There are lots of normalization methods and attention score computing methods. Could you please provide the visualization code of Fig.4.?