gasteigerjo / dimenet

DimeNet and DimeNet++ models, as proposed in "Directional Message Passing for Molecular Graphs" (ICLR 2020) and "Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules" (NeurIPS-W 2020)
https://www.daml.in.tum.de/dimenet
Other
286 stars 60 forks source link

Questions w.r.t MD17 #17

Closed thu-wangz17 closed 3 years ago

thu-wangz17 commented 3 years ago

Hi, it is a very nice work. However, I have some questions w.r.t the results on MD17 dataset, which I didn't find in the papers. First, what's the cutoff radius for this dataset? Second, is the benchmark of energy the MAE per molecule or the MAE per atom? And is that of forces the MAE per molecule or MAE per atom, or MAE per atom per component? If the results are MAE per molecule, when the framework applied on supercell of a crystal, maybe the MAE will be very large, since the number of atoms is very large. These make me confused. Thank you very much.

Frank-LIU-520 commented 3 years ago

Hi, it is a very nice work. However, I have some questions w.r.t the results on MD17 dataset, which I didn't find in the papers. First, what's the cutoff radius for this dataset? Second, is the benchmark of energy the MAE per molecule or the MAE per atom? And is that of forces the MAE per molecule or MAE per atom, or MAE per atom per component? If the results are MAE per molecule, when the framework applied on supercell of a crystal, maybe the MAE will be very large, since the number of atoms is very large. These make me confused. Thank you very much.

First, I think the cutoff should be 6 if I am correct; Second, it is the MAE per molecule. DimeNet didn't test on PBC models. Hope it helps you fine.

thu-wangz17 commented 3 years ago

Hi, it is a very nice work. However, I have some questions w.r.t the results on MD17 dataset, which I didn't find in the papers. First, what's the cutoff radius for this dataset? Second, is the benchmark of energy the MAE per molecule or the MAE per atom? And is that of forces the MAE per molecule or MAE per atom, or MAE per atom per component? If the results are MAE per molecule, when the framework applied on supercell of a crystal, maybe the MAE will be very large, since the number of atoms is very large. These make me confused. Thank you very much.

First, I think the cutoff should be 6 if I am correct; Second, it is the MAE per molecule. DimeNet didn't test on PBC models. Hope it helps you fine.

Thank you very much. My understanding is that the energy is the label of the graph, thus MAE should be the energy error per molecule and the force is the label of the node in graph, thus MAE is the force error per atom per component. Since some models such as SchNet and DeePMD were proposed that are suitable for crystals, whose number of atoms will be large. In the citation of table 2 in DeepPot-SE, the RMSE is normalized by the number of atom. I think for crystals or some large molecules, such as protein, is MAE which is normalized with the number of atoms more reasonable?

gasteigerjo commented 3 years ago

I agree. I think an MAE per atom is more reasonable in general, since both the energy and the forces are extensive quantities. This is actually a common measure in many theoretical chemistry papers. However, for our MD17 results we report the overall MAE (summed over force components -- this doesn't even consider the 3D nature of force vectors), since this is the usual thing to show on this dataset.

We use a cutoff of 5A. But maybe you can get better results with 6A.

thu-wangz17 commented 3 years ago

@klicperajo Thank you very much. I still have a comment since you say the MAE of force is summed over force components. However, actually I asked the author of SchNett the same question, what he told me is MAE / atom / component (i.e. mean over all force components). Thus I think the results on MD17 datasets in their recent work PAINN are also the mean value over the components, which are unfair for DimeNet. Your work is a really very nice work.

gasteigerjo commented 3 years ago

My statement was a little imprecise. We also take the global mean, just like SchNet.