deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.49k stars 510 forks source link

[BUG] Different versions of LAMMPS produce different NVE trajectories with DP #1656

Closed Yi-FanLi closed 2 years ago

Yi-FanLi commented 2 years ago

Bug summary

We ran the NVE simulation for a water system with the DP forcefiled, by using two different versions of LAMMPS. One is 20201029 (the old version), and the other is 20210921 (the new version). The old version and the new version do not produce the same trajectory for the NVE MD.

DeePMD-kit Version

v2.1.0-33-gf7aec87

TensorFlow Version

2.7.0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

Unzip this folder to obtain the input script and the initial configuration: dpsr_nve.zip The model can be downloaded from this link: https://drive.google.com/file/d/1u_A26jEEGMl16nReh-oHNXM1XMCMg1CP/view?usp=sharing

Steps to Reproduce

Install LAMMPS+DeePMD-kit with lammps 20201029 and 20210921. The two LAMMPS versions can be obtained by checking out the following commits of the LAMMPS official repo:

Further Information, Files, and Links

See the following figures for the different trajectories produced: image image

Yi-FanLi commented 2 years ago

The radial distribution functions g_OO generated with these 2 NVE MD simulations are as follows: image

Yi-FanLi commented 2 years ago

We also tested the SPC/E water model and the Lennard-Jones potential with both versions. They will generate exactly the same trajectories for 100, 000 steps.

jameswind commented 2 years ago

One quick question before more details. What if you run them on different machines? Will the same version of LAMMPS always produce the same results?

On Mon, Apr 25, 2022 at 12:10 AM Yifan Li李一帆 @.***> wrote:

The radial distribution functions g_OO generated with these 2 NVE MD simulations are as follows: [image: image] https://user-images.githubusercontent.com/39401945/164985542-498fbdc1-2d7e-4ad2-a482-bbc9e8c27744.png

— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/1656#issuecomment-1107870895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DCYTUZ3U7OLW6XAMQX3VGVW55ANCNFSM5UGMVATA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jameswind commented 2 years ago

This looks like energy drift …

On Mon, Apr 25, 2022 at 12:14 AM Yifan Li李一帆 @.***> wrote:

We also tested the SPC/E water model and the Lennard-Jones potential with both versions. They will generate exactly the same trajectories for 100, 000 steps.

— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/1656#issuecomment-1107871804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DC2UMQYZ25VNDVVCECLVGVXPRANCNFSM5UGMVATA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

njzjz commented 2 years ago

Can you dump all coordinates, velocities, and forces?

ChunyiZh commented 2 years ago

I found that this problem is not caused by different versions of LAMMPS. The same version run two times can produce different results, see figure (e). 659dfe6934fc57704b218ae89ba38f4

Yi-FanLi commented 2 years ago

In the following are the dumped trajectories of 2 NVE runs with the same machine and version. 2151650843332_ pic_hd

What we could observe is that, there are only 15 significant digits (or even fewer) in the forces. However, there are 16 significant digits in the double precision float point numbers. The truncation error in the forces might lead to the difference in the trajectories from different runs, and the accumulated truncation error can result in significant difference in the trajectories.

Yi-FanLi commented 2 years ago

In the following figure, I am presenting the difference of the Hamiltonians from different runs. Both runs from the same settings (the same version and machine) and different versions of software are presented. The logarithm of the absolute values of the difference is taken.

image

We can observe that before 4, 000 steps where the trajectories diverge significantly, the difference accumulates exponentially (since the logarithm accumulates linearly). This is the character of the truncation error. After 4, 000 steps, the differences in the Hamiltonians oscillate at the order of 1e-2. This is at the order of magnitude at which the Hamiltonian oscillates itself. This observation supports the idea that the difference of the trajectory stems from the truncation error.

Yi-FanLi commented 2 years ago

I am also putting the difference of the x coordinate of 1 atom here. The similar exponential behavior with the Hamiltonian can be observed. This proves that the difference between the coordinate also has the character of the accumulation of the truncation error.

image

njzjz commented 2 years ago

Did you use CPUs or GPUs?

njzjz commented 2 years ago

What we could observe is that, there are only 15 significant digits (or even fewer) in the forces. However, there are 16 significant digits in the double precision float point numbers.

A double contains 53 bits of fraction, meaning the accuracy should be less than log(2**53)=15.95. The cumulative error may be even higher after the whole graph. So this is not an unexpected point.

jameswind commented 2 years ago

I agree with you.

On Mon, Apr 25, 2022 at 11:17 AM Yifan Li李一帆 @.***> wrote:

I am also putting the difference of the x coordinate of 1 atom here. The similar exponential behavior with the Hamiltonian can be observed. This proves that the difference between the coordinate also has the character of the accumulation of the truncation error.

[image: image] https://user-images.githubusercontent.com/39401945/165014834-38273926-e67d-4018-bb36-d68e34f7434c.png

— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/1656#issuecomment-1108026485, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DC4KNDAZX2QLUSRESS3VGYFGFANCNFSM5UGMVATA . You are receiving this because you commented.Message ID: @.***>

njzjz commented 1 year ago

Add a note: see also NVIDIA/framework-determinism repository for more details about determinism.