brucefan1983 / GPUMD

Graphics Processing Units Molecular Dynamics
https://gpumd.org/dev
GNU General Public License v3.0
454 stars 115 forks source link

CUDA Error:700 Error text: an illegal memory access was encountered #667

Closed Bowen-677 closed 3 months ago

Bowen-677 commented 3 months ago

I got "CUDA Error: File: ./utilities/gpu_vector.cuh Line: 140 Error code: 700 Error text: an illegal memory access was encountered" when I tried to calculate the thermal conductivity of Ga2O3-Au heterostructures. I had tested the same training dataset and model.xyz with moment tensor potential before, it can run stable MD simulations but the speed is too slow. So I chose GPUMD to speed up the simulation process, however it can not run. I have also read the closed issue CUDA Error code: 700 #532. Could you please help me check whats wrong with my code?

https://github.com/Bowen-677/Ga2O3-Au-thermal-conductivitity

brucefan1983 commented 3 months ago

There is at least one problem with your model.xyz:

The atom positions are defined as fractional coordinates, which are not consistent with the GPUMD conventions.

Bowen-677 commented 3 months ago

There is at least one problem with your model.xyz:

The atom positions are defined as fractional coordinates, which are not consistent with the GPUMD conventions.

Thanks for your reply Prof.Fan. After changing it to Cartesian coordinates, it can run with thousands steps with neighbor.out as follows: Neighbor info at step 0: radial(max=197,actual=209), angular(max=35,actual=27). Neighbor info at step 0: radial(max=197,actual=209), angular(max=35,actual=27). Neighbor info at step 0: radial(max=197,actual=209), angular(max=35,actual=27). Neighbor info at step 0: radial(max=197,actual=209), angular(max=35,actual=27). Neighbor info at step 1000: radial(max=197,actual=214), angular(max=35,actual=32). Neighbor info at step 2000: radial(max=197,actual=220), angular(max=35,actual=40).

and then breakdown with CUDA Error: 700. How could I increase the max radial since it seems actual radial larger than the max radial. Looking back at the questions you've answered in the past, it looks like there's still a problem with the potential field training, so I'm going to try to add some more training sets.

brucefan1983 commented 3 months ago

Neighbor info at step 0: radial(max=197,actual=209), angular(max=35,actual=27).

This means your initial model is still problematic because at step 0 (the initial model) the actual number of neighbors even exceeds the maximum * 1.25 (the max value is actually the maximum value recorded in nep.txt times 1.25).

You need to check your model.xyz carefully.

Bowen-677 commented 3 months ago

Neighbor info at step 0: radial(max=197,actual=209), angular(max=35,actual=27).

This means your initial model is still problematic because at step 0 (the initial model) the actual number of neighbors even exceeds the maximum * 1.25 (the max value is actually the maximum value recorded in nep.txt times 1.25).

You need to check your model.xyz carefully.

Ok, thanks Prof. Fan for the tip, I'll go and double check my structure