We had a few discussions about the best way to train a neural network potential on QM energies without loss of precision and numeric instabilities.
I am proposing the following approach (I have implemented this already in the train PR, but revising this discussion to make it clear for everyone involved in a separate PR seems appropriate):
In the first preprocessing step, we calculate (using regression) or obtain (either from the user, or from the dataset if provided there) the self-atomizing energies of each element E_ase. This is then passed to the neural network and will be used to calculate the atomic energy E_i as E_i = E_i_pred + E_element_ase. That makes E_element_ase a parameter of each trained neural network that will be stored with the model.
We have two scenarios:
training scenario: The loss is calculated on the sum of E_i, the total energy E_total_predict and the E_label_without_ase provided by the dataset. In such a scenario, E_element_ase is not added to E_i.
inference scenario: E_element_ase is added
Normalization
We can also normalize E to help with training. Currently, we are calculating the mean and the standard deviation of E_label (for this calculation, the self-energies are removed). We can then scale to a unit interval.
In practice, this will mean that for a given QM dataset we obtain E_scaling_mean and E_scaling_stddev, and the total energy we predict is E = E_total_predict + E_scaling_mean * E_scaling_stddev. This will make especially sense if the value of E_i is restricted by a hyperbolic tangent or sigmoid activation function.
Note: there is an argument that we can immediately train on E_label (including the atomic self-energies) using such an energy expression. And that is true, but when we remove the atomic self-energies on the QM dataset, we operate on float64, while during training we are in float32. This loss of precision is relevant for larger training set molecules.
Parameter initialization
Difference between training and inference stage
The neural network potential behaves differently in these two stages. During inference, we want to predict the total energy (that corresponds to the QM energy); during training, we want to match the E_label provided by the dataset (that might represent the QM energy after some transformation).
Currently, we will match the QM energy if we provide values that have been used for the transformation
If, e.g., self-energies are not provided, these won't be added; if 'scaling_mean' and 'scaling_stddev' are not provided, these are set to 1 and 0, respectively. There might be a cleaner way to control this behavior, but I think it is fine for the moment.
Todos
Notable points that this PR has either accomplished or will accomplish.
Description
Atomic self-energies
We had a few discussions about the best way to train a neural network potential on QM energies without loss of precision and numeric instabilities.
I am proposing the following approach (I have implemented this already in the
train
PR, but revising this discussion to make it clear for everyone involved in a separate PR seems appropriate): In the first preprocessing step, we calculate (using regression) or obtain (either from the user, or from the dataset if provided there) the self-atomizing energies of each elementE_ase
. This is then passed to the neural network and will be used to calculate the atomic energyE_i
asE_i = E_i_pred + E_element_ase
. That makesE_element_ase
a parameter of each trained neural network that will be stored with the model.We have two scenarios:
E_i
, the total energyE_total_predict
and theE_label_without_ase
provided by the dataset. In such a scenario,E_element_ase
is not added toE_i
.E_element_ase
is addedNormalization
We can also normalize
E
to help with training. Currently, we are calculating the mean and the standard deviation of E_label (for this calculation, the self-energies are removed). We can then scale to a unit interval.In practice, this will mean that for a given QM dataset we obtain
E_scaling_mean
andE_scaling_stddev
, and the total energy we predict isE = E_total_predict + E_scaling_mean * E_scaling_stddev
. This will make especially sense if the value ofE_i
is restricted by a hyperbolic tangent or sigmoid activation function.Note: there is an argument that we can immediately train on
E_label
(including the atomic self-energies) using such an energy expression. And that is true, but when we remove the atomic self-energies on the QM dataset, we operate onfloat64
, while during training we are infloat32
. This loss of precision is relevant for larger training set molecules.Parameter initialization
Difference between training and inference stage
The neural network potential behaves differently in these two stages. During inference, we want to predict the total energy (that corresponds to the QM energy); during training, we want to match the
E_label
provided by the dataset (that might represent the QM energy after some transformation).Currently, we will match the QM energy if we provide values that have been used for the transformation
If, e.g.,
self-energies
are not provided, these won't be added; if 'scaling_mean' and 'scaling_stddev' are not provided, these are set to1
and0
, respectively. There might be a cleaner way to control this behavior, but I think it is fine for the moment.Todos
Notable points that this PR has either accomplished or will accomplish.
Status