Closed sgbaird closed 2 years ago
Not a noob question, I am not an expert on group theory/crystallography myself to spent a lot of time deliberating about these decisions. My reasoning was that averaging before the message passing would be the same as collapsing all the equivalent Wyckoff positions that can be relabelled to only be encodings of the site symmetry. Whereas averaging after allows the model to maintain the fact that sites with the same site symmetry are distinct. So give an explicit example here: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-normsets?from=wycksets&gnum=68 shows in spg 68 we have 3 sets with the same site symmetry but that are distinct and so should have distinct embeddings if we want to make use of all the information in the representation.
My reasoning was that averaging before the message passing would be the same as collapsing all the equivalent Wyckoff positions that can be relabelled to only be encodings of the site symmetry. Whereas averaging after allows the model to maintain the fact that sites with the same site symmetry are distinct.
@CompRhys You're referring to the distinction made here, right?
The other thing that rereading isn't so clear (but is in the figure) is that we pool the token reps to get materials embeddings and then average the pooled materials embeddings of the equivalent representations.
A topic that's come up in another context for me.
@sgbaird I'd be curious what the other context is? And what the results were in case you tried directly averaging the input embedding rather than after message-passing/transformer-encoding. I wouldn't expect performance to be that much worse, maybe even the same.
@CompRhys thanks! I appreciate the discussion and clarification.
@janosh nothing that I've implemented directly. The two other places where this has gotten me thinking:
PGNNDEC is parameterized by a SE(3) equivariant PGNN that inputs a multi-graph representation (section 3.1) of the noisy material structure and the latent representation. The node embedding for node i is obtained by the concatenation of the element embedding of a\~i and the latent representation z, followed by a MLP, h0 i = MLP(ea(\~ ai) k z), where k denotes concatenation of two vectors and ea is a learned embedding for elements. After K message-passing layers, PGNNDEC outputs a vector per node that is equivariant to the rotation of M\~ (emphasis added)
During the lectures, there was discussion about equivariance vs. invariance and how which one you want depends on where you collapse things. For example, I think equivariance is implemented by collapsing the symmetric implementations later in the stack (i.e. the algorithm is allowed to distinguish between symmetric representations up until close to the end, e.g. a final pooling layer). In contrast, invariance maybe is implemented towards the beginning so that the model doesn't distinguish between symmetric representations and treats them as identical. Whether invariance vs. equivariance is desired depends on the application.
I wasn't clear on these topics, and I might be overgeneralizing, misremembering, or misinterpreting. I think it was James Spencer and/or David Pfau from DeepMind who talked about this, but maybe @halvarsu can correct me if I'm wrong.
Follow-up snippet from CDVAE manuscript, Appendix B.3:
GNN ARCHITECTURE We use DimeNet++ adapted for periodicity (Klicpera et al., 2020a;b) as the encoder, which is SE(3) invariant to the input structure. The decoder needs to output an vector per node that is SE(3) equivariant to the input structure. We use GemNet-dQ (Klicpera et al., 2021) as the decoder. We used implementations from the Open Catalysis Project (OCP) (Chanussot et al., 2021), but we reduced the size of hidden dimensions to 128 for faster training. The encoder has 2.2 million parameters and the decoder has 2.3 million parameters.
Interestingly, the encoder is SE(3) invariant while the decoder is SE(3) equivariant. I'm curious why.
Interestingly, the encoder is SE(3) invariant while the decoder is SE(3) equivariant. I'm curious why.
Haven't read CDVAE yet (though I'm planning to). Are you asking why the encoder is not also equivariant? Or why the decoder can't be invariant? If the 2nd, then the answer seems to be
The decoder needs to output an vector per node that is SE(3) equivariant to the input structure.
Though looking at the paper now, another paragraph seems to contradict the one you quoted:
To capture the necessary invariances and encode the interactions crossing periodic boundaries, we use SE(3) equivariant graph neural networks adapted with periodicity (PGNNs) for both the encoder and decoder of our VAE.
I'm guessing that's an oversight since they repeat elsewhere that the encoder is invariant.
@janosh good catch. @txie-93 or @kyonofx maybe you could confirm?
Thanks for pointing this out! Yes, this is indeed an oversight. The encoder is invariant because it encodes crystal into a latent vector. The latent vector is invariant to the symmetry operations.
Thanks @txie-93!
https://www.science.org/doi/epdf/10.1126/sciadv.abn4117
Probably a noob question here. I think I understand that it needs to happen at some point, but why does it need to happen after message passing? Why not implement this at the very beginning (i.e. in the input data representation)? Not so much doubtful of the choice as I am interested in the mechanics behind this choice. A topic that's come up in another context for me.