Gap between training set and test set RMSE in energy

Hello,

I have a question regarding the training errors using Kalman filter. When we used Kalman filter (recommended parameters from the example here on github) together with the normalization of the data via nnp-norm, we observed quick drop in the training error and formation of a gap between the training and test set RMSE in energy, for example, 0.01 meV/atom for the training set and 0.3 meV/atom for the test set. The RMSE of test set did not increase during the following epochs but stayed constant. However, the RMSE in forces for the test set and training set was comparable and kept decreasing during the training. Our first idea was, that we were overfitting the data, but the RMSE in energy of the test set did not start to increase and as I said, the training of the forces kept improving. At the same, the simulation with the trained NNP show only minor extrapolation and yield very stable simulations. I saw similar behavior in the publication https://pubs.acs.org/doi/abs/10.1021/acs.jctc.8b01092 Fig. 5. I would like to ask if this is standard behavior of this setup or there is something wrong with my training. The gap disappears when we don't use nnp-norm, but the training errors are larger. I can provide more information on the training if needed.

Thanks a lot!

CompPhysVienna / n2p2

Gap between training set and test set RMSE in energy #126