Open GhermanProteic opened 2 years ago
I have essentially the same problem.
I followed the training method suggested here: https://espaloma.wangyq.net/experiments/qm_fitting.html But the loss is basically in the millions from what I can see, and it looks like there is some kind of normalization missing or something in the procedure?
sorry about this. it is caused by an outdated API that I needed to remove. I'll update the example in a day or two.
The updated Colab notebook should work! http://data.wangyq.net/esp_notesbooks/qm_fitting.ipynb
Essentially the energies need to be centered before calculating the error.
@yuanqing-wang , Hi, I want to ask several questions based on the above discussion in this issue here.
I also followed the code in colab (to train on gen2 dataset u_ref but only for 200 epochs).
After training, I use the each of the 200 models to predict the energy of a molecule in gen2 dataset and plot here. As you can see, the energy keeps oscillating (I don't know whether it is the issue because I just train for 200 epochs but I think the model is going to converge).
I also compared the predicted energies with the energy given in gen2 dataset as below and found neither of them can be comparable in absolute value with the predicted energies. During the training, the total energy is trained on .nodes["g"].data["u_ref"] for each molecule, which in my view should be something related with QM calculation and consider the n2+n3+n4_proper format energy given by the model. But the loss function is defined through a normalization process esp.metrics.center()
, which means the model can only learn the relative energy from the fitted data. Since that parameters in traditional FF are fitted in the aim to reproduce experimental energy, I am confused.
In summary, here are 2 main questions:
"u_ref"
, "u_qm"
, "u_gaff-1.81"
, and "u_openff-1.2.0"
. Is there any relationship between "u_ref"
and "u_qm"
? Why the training was fitted on "u_ref"
instead of "u_qm"
?"u"
? Is the energy can be compatible to any existing force field?molecule | energy predicted |
---|---|
>>> dataset_name = "gen2"
>>> ds = esp.data.dataset.GraphDataset.load(dataset_name)
>>> espaloma_model.load_state_dict(torch.load("100.th", map_location=torch.device("cpu")))
>>> espaloma_model(ds[5].heterograph)
>>> print(ds[5].heterograph.nodes["g"].data["u"][:,0:3])
# tensor([[7.8801, 7.8812, 7.8807]], grad_fn=<SliceBackward0>)
>>> espaloma_model.load_state_dict(torch.load("199.th", map_location=torch.device("cpu")))
>>> espaloma_model(ds[5].heterograph)
>>> print(ds[5].heterograph.nodes["g"].data["u"][:,0:3])
# tensor([[8.2513, 8.2540, 8.2544]], grad_fn=<SliceBackward0>)
>>> print(ds[5].heterograph.nodes["g"].data["u_ref"][:,0:3])
# tensor([[-752.6477, -752.6463, -752.6459]])
>>> print(ds[5].heterograph.nodes["g"].data['u_qm'][:,0:3])
# tensor([[-752.6166, -752.6210, -752.6230]])
>>> print(ds[5].heterograph.nodes["g"].data["u_gaff-1.81"][:,0:3])
# tensor([[0.0672, 0.0610, 0.0583]])
>>> print(ds[5].heterograph.nodes["g"].data['u_openff-1.2.0'][:,0:3])
# tensor([[0.1153, 0.1101, 0.1089]])
Reopening this issue to make sure we address the most recent comment!
Hi,
I am trying to overfit espaloma to a small batch from gen2 dataset. I noticed that the reference energy u_ref is in large negative scale:
(side note, when I try to increase the batch size I get the following error)
The model that I am using is initialized the following way:
Now I am trying to overfit and I am training the following model:
I am using the following loss function:
After training the train loss plot looks the following (epochs on the x-axis):
The loss gets stuck at ~1.4M when you would expect it to be close to 0 (since I am training only on 5 examples). The energy for individual examples converges at some small positive value:
If I do the same but on pepconf dataset (peptides) and get similar results. The output of espaloma is on a different scale.
My question is, what am I doing wrong? Is it the model architecture? The normalizer? Or smth else? Would appreciate any help.
Thanks!