gasteigerjo / dimenet

DimeNet and DimeNet++ models, as proposed in "Directional Message Passing for Molecular Graphs" (ICLR 2020) and "Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules" (NeurIPS-W 2020)
https://www.daml.in.tum.de/dimenet
Other
286 stars 60 forks source link

Periodic DimeNet #7

Closed pfebrer closed 4 years ago

pfebrer commented 4 years ago

Hi, very nice work on this! :)

I've been exploring the ML/deep learning landscape to find some inspiration for cool ideas that would be nice to play with during my PhD in materials science. I've seen lots of implementations of deep learning for molecules, but not so much for periodic structures such as crystals.

I would like to know if you have given any thought on how periodic conditions could work in a GNN and specifically in DimeNet. Maybe you have already implemented it and I have failed to found it (in that case, excuse me). I have some intuition about it, but I would like to know your thoughts about it, if it's not too much to ask.

From what I understood in your paper, the information about the atoms/nodes positions is only "stored" at the bonds/edges, encoded as the angles and bond lengths. Is this right? If so, my intuition is that, given a periodic system like this one:

image

you can say that, in the left border, atom 1 is effectively connected to atom 4 through a connection that is in the direction of bond 8 in this drawing. Then, in my naive view, this should account fully for the periodicity of the system, because atom 4 contains the information of the rest of the structure and a kind of loop will be created there.

I'd like to know if you think that this would make sense and if not, I would appreciate if you could share the reasons why this won't work.

Thanks in advance!

CompRhys commented 4 years ago

@pfebrer96 Both SchNet and MEGNet (I highlight these only because DimeNet benchmarks against them and both have code available on Github not due to any association) have periodic BCs for looking at crystalline materials and so it should also be possible to the same here I am also interested in such applications and so have been looking at the code to see how it might be done.

pfebrer commented 4 years ago

Wow, thank you very much! I didn't notice that. I'm going to try to understand how do they do it then :)

gasteigerjo commented 4 years ago

Great to hear from you both!

Last summer we did some small experiments on materials using the periodic BCs you mentioned, but then decided to focus on small molecules so we're not spread out too thinly. From what I remember some of the settings like the cutoff (which is one of the most important hyperparameters in general) need to be set differently for periodic materials, but in general it seemed to work. I can't report anything specific and we haven't worked on that direction since, though.

pfebrer commented 4 years ago

Ok, thanks! I'm trying to understand how the input data is generated and structured to get a sense of how this should be done.

This may be obvious but, when you were experimenting with periodic materials, did you add extra atoms with their positions R or did you modify how you calculate edges to account for the periodic conditions?

For example, in DataContainer, you calculate distances like this https://github.com/klicperajo/dimenet/blob/bf725c33755cd6fb87661fe03956b5fb30889742/dimenet/training/data_container.py#L69

and then apply the cutoff. This obviously only finds distances within the unit cell of the material, so it seems that you would need to add extra atoms to "fake" a periodicity. I'm still lacking deep understanding of how the model works: is it possible/does it make sense to calculate edges also based on the periodic images of the atoms/nodes? That is, in my drawing, you would have two distances between atom 1 and 4: the distance inside the unit cell, and the distance between periodic images. Only the second one would "survive" to the cutoff.

Thanks!

gasteigerjo commented 4 years ago

I don't think you want a cutoff that is so short it will remove the neighbor inside the unit cell. We've used a cutoff of 5A for small molecules, so this can even include third-hop neighbors in the molecular graph. We didn't spend too much time investigating periodicity, so I can't give you any exact hints. The fact that these two atoms would be connected in 2 different ways might be problematic, but you'd have to test it yourself. I think you can have duplicate indices in the index lists we use, so that should work. You just need to make sure that you consistently calculate distances and angles.

I don't think that adding fake atoms is a good idea, since every atom needs to consistently update its embeddings, which seems problematic with fake atoms.

pfebrer commented 4 years ago

Great, thanks for the comments! I will keep them in mind.

Should I close this?

gasteigerjo commented 4 years ago

You're welcome!