brucefan1983 / GPUMD

Graphics Processing Units Molecular Dynamics
https://gpumd.org/dev
GNU General Public License v3.0
466 stars 116 forks source link

cudaErrorIllegalAddress: an illegal memory access was encountered #583

Closed xinyu1905 closed 7 months ago

xinyu1905 commented 7 months ago

I encountered the following error while computing the thermal conductivity of PbTe using gpumd. Could you please advise on the cause? …… Initialized velocities with T = 300 K. Use NVE ensemble for this run. Time step for this run is 1 fs. Dump thermo every 10 steps. Run 10000 steps. 1000 steps completed. 2000 steps completed. terminate called after throwing an instance of 'thrust::system::system_error' what(): exclusive_scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered 已放弃 (核心已转储) PbTe_input.zip

brucefan1983 commented 7 months ago

This usually means the simulation is not stable. There are two probable reasons for this:

xinyu1905 commented 7 months ago

i performed thermal conductivity calculations by expanding the supercell on the structures used to train the NEP potential, but meet the error above. if i use the structure in the GPUMD package, the error disappeared

brucefan1983 commented 7 months ago

So you used the NEP model in the example? This one: https://github.com/brucefan1983/GPUMD/tree/master/examples/11_NEP_potential_PbTe

xinyu1905 commented 7 months ago

no, the NEP model was trained using AIMD data that I ran myself, with training parameters based on examples from GPUMD. The relevant files are provided in my question. when I expand the cell using my POSCAR file and run thermal conductivity calculations, i encounter errors. however, using the model.xyz file provided in the GPUMD examples does not result in any errors.

brucefan1983 commented 7 months ago

OK, I have tested with your provided NEP model and inputs. I think it is becasue your training data are too simple. You might have only a short AIMD trajectory sampled at 300 K.

Chaning to use the NEP model in the PbTe example, I can achieve stable MD running with your model.xyz.

brucefan1983 commented 7 months ago

Even the training data in the PbTe example is very simple. It just contains some fixed-cell structures from a few AIMD runs up to 900 K.

Training data are very important. A "good" training data set should consist of structures produced under various conditions.

xinyu1905 commented 7 months ago

very thanks to your response, I will test it following your suggestions

brucefan1983 commented 7 months ago

OK, I iwll close this issue. You can reopen it or create another one if you have more questions.