Closed tsjain closed 4 years ago
Hello,
All our layers and operations are permutation invariant. Hence, changing the order does not impact the output.
We do not use positional encoding as in NLP Transformer. Instead, information about the structure of the molecule is given to the model by the adjacency and distance matrices. Thanks to this, our Molecule Self-Attention is permutation invariant.
Hi,
Thanks for the really nice and well explained paper.
I had a question regarding how the prediction output is invariant to the order of the atoms in the molecule. One can randomly permute the order of atoms in both the adjacency matrix, distance matrix as well as the atom feature matrix.
Will the MAT give the same property prediction for the different permutations?
My understanding is that the learned Attention is between positions so it is not permutation invariant. In the NLP uses of the Transformer, there is a positional encoding term added which helps with learning distant context, but unlike in language tasks, the order of the atoms in a molecule can be specified quite arbitrarily.
Thanks.