Impact of data set normalization on energy RMSE

In general data set normalization can affect the RMSE, especially if you are using the Kalman filter for training with the same set of parameters:

Data Set Normalization. Although in principle not a requirement for successful HDNNP training, it is beneﬁcial to normalize data from reference calculations in such way that the ﬁtting procedure becomes independent of a physical unit system. This is in particular relevant for Kalman ﬁlter training because a number of free parameters inﬂuencing the ﬁt quality are dependent on the magnitude of numeric values in the data set. Recommendations found in the literature for optimal parameter settings are valid only for normalized data sets.63 Since we aim at training both energies and forces we must ensure that a procedure normalizing both quantities is chosen.

from Singraber, A.; Morawietz, T.; Behler, J.; Dellago, C. Parallel Multistream Training of High-Dimensional Neural Network Potentials. J. Chem. Theory Comput. 2019, 15 (5), 3075–3092. https://doi.org/10.1021/acs.jctc.8b01092

If the discrepancy is reasonable is hard to tell (at least for me) without knowing the actual data set. In general I would recommend sticking to a normalized data set. But if you need further information I think @singraber would be the better contact.

CompPhysVienna / n2p2

Impact of data set normalization on energy RMSE #147