deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.45k stars 499 forks source link

running error on gpu_cuda.h #533

Closed zhangyongsdu closed 3 years ago

zhangyongsdu commented 3 years ago

I got a error when run MD simuation with the API branch deepmd-kit. the error looks like: cuda assert: invalid argument /scratch/qf9/yxz565/softwares/deepmd-kit-api-20210417/source/lib/include/gpu_cuda.h 48.

I used 4 V100 GPU (mpirun -np 4) with cuda/10.1, cudnn/7.6.5-cuda10.1, nccl/2.6.4-1+cuda10.1 and openmpi/4.0.1. This error also occurs for cuda 11, cudnn 8. The error does not occur for the API brach before 20th March 2021.

amcadmus commented 3 years ago

Thanks a lot for reporting the bug, we encounter the same issue and are trying to fix it.

yhliu918 commented 3 years ago

Got an error when unittest on /source/tests , latest devel branch cuda assert: an illegal memory access was encountered /tmp/pip-req-build-1dcl1ksu/source/lib/include/gpu_cuda.h 108

amcadmus commented 3 years ago

@zhangyongsdu We have fixed the bug by PR #545.