deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.45k stars 499 forks source link

[BUG] _Process killed when running lammps by using deepmd-kit-2.0.0.beta0 model on GPU and DCU_ #824

Closed Carrotkingdom closed 3 years ago

Carrotkingdom commented 3 years ago

I am running a DP model with deepmd-kit-2.0.0.beta0 version, both original and compressed, on GPU and DCU. In all cases the process is killed after several steps, depending on the number of threads and environment.

Here are the model type, machine type, the number of threads and the corresponding step number before the process died.

image

The lammps input file and the submit files on GPU and DCU are attached below.

MgZn.zip

galeselee commented 3 years ago

Will this problem occur when running on the CUDA platform?

denghuilu commented 3 years ago

Have you tested these models on a single GPU or DCU environment?

galeselee commented 3 years ago

Have you tested these models on a single GPU or DCU environment?

about 12300 on DCU

amcadmus commented 3 years ago

I would suggest trying the latest commit on devel

denghuilu commented 3 years ago

I will try it. > I would suggest trying the latest commit on devel

Carrotkingdom commented 3 years ago

Have you tested these models on a single GPU or DCU environment?

Same results on a single GPU. On CPU the process is normal and won't be killed.

denghuilu commented 3 years ago

The latest GPU code has been tested on ehpc and there is no problem @amcadmus.

Carrotkingdom commented 3 years ago

The latest DCU code on NSCC-ZZ is tested successfully @amcadmus .