choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
11 stars 4 forks source link

Overall improvments to the training pipeline #130

Closed wiederm closed 4 months ago

wiederm commented 4 months ago

Description

This PR solves the following inconsistencies:

and introduces:

Minor issues that this PR resolves:

Details on the E_i scaling: The total energy E is calculated as $E = \sum E_i$. The expression for $E_i$ is changed to $E_i = E_i * \sigma(E_i) + \mu(E_i)$, with $\mu(E_i)$ as the average per atom energy of the QM energies (self energies already removed).

Notes: With these changes, the initial validation set RMSE is around ~40 kJ/mol in the first epoch. From observations of multiple training runs with SchNet, it takes about 100 epochs to have a validation RMSE error of 8 kJ/mol and another 100 epochs to improve below 4 kJ/mol. Training on QM9 on a node with 4 x RTX 3090 100 epoch takes around 30 minutes.

Status