Closed Yi-FanLi closed 2 years ago
The radial distribution functions g_OO generated with these 2 NVE MD simulations are as follows:
We also tested the SPC/E water model and the Lennard-Jones potential with both versions. They will generate exactly the same trajectories for 100, 000 steps.
One quick question before more details. What if you run them on different machines? Will the same version of LAMMPS always produce the same results?
On Mon, Apr 25, 2022 at 12:10 AM Yifan Li李一帆 @.***> wrote:
The radial distribution functions g_OO generated with these 2 NVE MD simulations are as follows: [image: image] https://user-images.githubusercontent.com/39401945/164985542-498fbdc1-2d7e-4ad2-a482-bbc9e8c27744.png
— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/1656#issuecomment-1107870895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DCYTUZ3U7OLW6XAMQX3VGVW55ANCNFSM5UGMVATA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This looks like energy drift …
On Mon, Apr 25, 2022 at 12:14 AM Yifan Li李一帆 @.***> wrote:
We also tested the SPC/E water model and the Lennard-Jones potential with both versions. They will generate exactly the same trajectories for 100, 000 steps.
— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/1656#issuecomment-1107871804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DC2UMQYZ25VNDVVCECLVGVXPRANCNFSM5UGMVATA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I found that this problem is not caused by different versions of LAMMPS. The same version run two times can produce different results, see figure (e).
In the following are the dumped trajectories of 2 NVE runs with the same machine and version.
What we could observe is that, there are only 15 significant digits (or even fewer) in the forces. However, there are 16 significant digits in the double precision float point numbers. The truncation error in the forces might lead to the difference in the trajectories from different runs, and the accumulated truncation error can result in significant difference in the trajectories.
In the following figure, I am presenting the difference of the Hamiltonians from different runs. Both runs from the same settings (the same version and machine) and different versions of software are presented. The logarithm of the absolute values of the difference is taken.
We can observe that before 4, 000 steps where the trajectories diverge significantly, the difference accumulates exponentially (since the logarithm accumulates linearly). This is the character of the truncation error. After 4, 000 steps, the differences in the Hamiltonians oscillate at the order of 1e-2. This is at the order of magnitude at which the Hamiltonian oscillates itself. This observation supports the idea that the difference of the trajectory stems from the truncation error.
I am also putting the difference of the x coordinate of 1 atom here. The similar exponential behavior with the Hamiltonian can be observed. This proves that the difference between the coordinate also has the character of the accumulation of the truncation error.
Did you use CPUs or GPUs?
What we could observe is that, there are only 15 significant digits (or even fewer) in the forces. However, there are 16 significant digits in the double precision float point numbers.
A double contains 53 bits of fraction, meaning the accuracy should be less than log(2**53)=15.95. The cumulative error may be even higher after the whole graph. So this is not an unexpected point.
I agree with you.
On Mon, Apr 25, 2022 at 11:17 AM Yifan Li李一帆 @.***> wrote:
I am also putting the difference of the x coordinate of 1 atom here. The similar exponential behavior with the Hamiltonian can be observed. This proves that the difference between the coordinate also has the character of the accumulation of the truncation error.
[image: image] https://user-images.githubusercontent.com/39401945/165014834-38273926-e67d-4018-bb36-d68e34f7434c.png
— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/1656#issuecomment-1108026485, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DC4KNDAZX2QLUSRESS3VGYFGFANCNFSM5UGMVATA . You are receiving this because you commented.Message ID: @.***>
Add a note: see also NVIDIA/framework-determinism repository for more details about determinism.
Bug summary
We ran the NVE simulation for a water system with the DP forcefiled, by using two different versions of LAMMPS. One is 20201029 (the old version), and the other is 20210921 (the new version). The old version and the new version do not produce the same trajectory for the NVE MD.
DeePMD-kit Version
v2.1.0-33-gf7aec87
TensorFlow Version
2.7.0
How did you download the software?
Built from source
Input Files, Running Commands, Error Log, etc.
Unzip this folder to obtain the input script and the initial configuration: dpsr_nve.zip The model can be downloaded from this link: https://drive.google.com/file/d/1u_A26jEEGMl16nReh-oHNXM1XMCMg1CP/view?usp=sharing
Steps to Reproduce
Install LAMMPS+DeePMD-kit with lammps 20201029 and 20210921. The two LAMMPS versions can be obtained by checking out the following commits of the LAMMPS official repo:
lmp -in in.lammps
with the two different versions of LAMMPS.Further Information, Files, and Links
See the following figures for the different trajectories produced: