For the same input file, running it 20 times yields different results each time

deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics

https://docs.deepmodeling.com/projects/deepmd/

GNU Lesser General Public License v3.0

1.49k stars 510 forks source link

For the same input file, running it 20 times yields different results each time #2603

Closed XuFanffei closed 1 year ago

XuFanffei commented 1 year ago

Summary

For the same input file, running it 20 times yields different results each time. When testing with the Tersoff potential function on Bohrium, it is found that the results are the same for identical input files. Could you please explain why this is happening?

DeePMD-kit Version

2.1.5

TensorFlow Version

null

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

No response

Details

https://dp-devops.oss-cn-beijing.aliyuncs.com/temp/log1.lammps https://dp-devops.oss-cn-beijing.aliyuncs.com/temp/log2.lammps This is the result of two attempts, the in-file is the same.

njzjz commented 1 year ago

Both two links are inaccessible.

Please see the discussion in #2270.

brucefan1983 commented 1 year ago

If you use the GPU version, the major reason is that there are atomicAdd() functions in the CUDA code. This function does not have a definite execution order between different CUDA threads and different runs can thus lead to different results. MD simulation is chaotic and two phase trajectories can deviate from each other quickly with increasing time and finally become totally different.

taol1 commented 1 year ago

If you use the GPU version, the major reason is that there are atomicAdd() functions in the CUDA code. This function does not have a definite execution order between different CUDA threads and different runs can thus lead to different results. MD simulation is chaotic and two phase trajectories can deviate from each other quickly with increasing time and finally become totally different.

Thank you for your reply！

I wonder if there is a way around this, or if this deviation of the trajectory is acceptable in physics.

brucefan1983 commented 1 year ago

This randomness feature is indeed not good for debugging purposes (for developers), but it is not important for practical applications, where one usually intentionally introduces randomness in the initial conditions (different seeds for initializing velocities) to more diversely sample the phase space.

Deterministic calculations in GPU can only be obtained by changing the algorithms to avoid atomic summations over floating-point numbers

njzjz commented 1 year ago

Feel free to reopen the issue if you have more questions.