TUM-DAML / gemnet_pytorch

GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)
https://www.daml.in.tum.de/gemnet
Other
180 stars 29 forks source link

Question about creating graph #14

Closed xnuohz closed 1 year ago

xnuohz commented 1 year ago

Dear developers,

For datasets with chemical bonds, there are generally two ways to build a molecule graph:

Which way do you think is better? If there is a large-scale dataset with 1M samples, using the radius graph model will become very heavy and difficult to train.

Thanks:)

gasteigerjo commented 1 year ago

Hi!

People often use radius graphs since it gets around having to know the chemical bonds and edge cases where the bonds are ill-defined. This is often the case in molecular simulation. Also, methods often just perform better with a larger radius, i.e. moving further away from the chemical graph.

But you can also use multiple different graphs, as done e.g. by MXMNet or GemNet-OC.

Regarding your concern of "using the radius graph model will become very heavy and difficult to train":

jiali1025 commented 8 months ago

Dear developers,

Sorry for having a related question about this repo. I am confused by the concept of "More edges will give you better accuracy". Does it mean the human knowledge of the chemical graph is not a good prior, so the model will not learn well with this? Is the "More edges will give you better accuracy" only true when data is more? Also, I think graph NN with more depth will finally see all nodes as well, just have a prior structured way of information exchange.

gasteigerjo commented 8 months ago

Does it mean the human knowledge of the chemical graph is not a good prior, so the model will not learn well with this?

That statement was referring to radius graphs: A larger radius (i.e. more edges) typically gives better accuracy.

Still, I don't think the chemical graph gives a lot of information (or a useful prior) if you have all atom positions. Inferring the graph from atom distances doesn't seem too hard to me, especially compared to how difficult e.g. energy prediction is.

Is the "More edges will give you better accuracy" only true when data is more?

I don't know how many data points you need for the above statement to be true. My guess is that you don't need much, and nowadays it's easy to create a dataset with e.g. 10k data points. That would imho already be well above the threshold where I'd expect the prior from the chemical graph to help.

Also, I think graph NN with more depth will finally see all nodes as well, just have a prior structured way of information exchange.

True. But seeing neighbors only via multiple steps (a) makes the task harder, and (b) provides less geometrical information. Think about the extreme case of every atom only seeing 1 neighbor. In this case the GNN would not be able to triangulate the position of any atom. See all the discussions around GNN expressivity, e.g. in this paper or in our DimeNet paper.

Caveat: Most statements in this post are based on intuition and experience, not explicit data.