Closed Ericwang6 closed 3 years ago
Summary
An error in /source/lib/src/cuda/prod_env_mat.cu
/source/lib/src/cuda/prod_env_mat.cu
Deepmd-kit v2.0.0b0
When training data for small organic molecules with se_e2_a descriptor, an error occurs: cuda assert: DeePMD-kit: illegal nbor list sorting /home/yingze/deepmd-kit/source/lib/src/cuda/prod_env_mat.cu 509.
se_e2_a
cuda assert: DeePMD-kit: illegal nbor list sorting /home/yingze/deepmd-kit/source/lib/src/cuda/prod_env_mat.cu 509
My input.json :
{ "model": { "type_map": [ "C", "H", "N", "O" ], "descriptor": { "type": "se_e2_a", "sel": [ 48, 40, 48, 48 ], "rcut_smth": 0.5, "rcut": 6.0, "neuron": [ 20, 40, 80 ], "resnet_dt": false, "axis_neuron": 8, "type_one_side": true, "seed": 1, "activation_function": "gelu" }, "fitting_net": { "neuron": [ 240, 240, 240 ], "resnet_dt": true, "seed": 1, "activation_function": "gelu" } }, "learning_rate": { "type": "exp", "start_lr": 0.0001, "stop_lr": 5e-8, "decay_steps": 500 }, "loss": { "type": "ener", "start_pref_e": 0.02, "limit_pref_e": 10, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0, "limit_pref_v": 0 }, "training": { "numb_steps": 100000, "disp_file": "lcurve.out", "disp_freq": 1000, "numb_test": 1, "save_freq": 1000, "save_ckpt": "model.ckpt", "disp_training": true, "time_training": true, "training_data": { "batch_size": "auto", "systems": [ "./C0H0N0O2", "./C1H3N1O1" ] } } }
The program is run on pbs system.
Steps to Reproduce
An example of relevant data is attached here: issue.zip
This bug occurs when:
which causes the empty input in gelu.cu and thus breaks.
Summary
An error in
/source/lib/src/cuda/prod_env_mat.cu
Deepmd-kit v2.0.0b0
When training data for small organic molecules with
se_e2_a
descriptor, an error occurs:cuda assert: DeePMD-kit: illegal nbor list sorting /home/yingze/deepmd-kit/source/lib/src/cuda/prod_env_mat.cu 509
.My input.json :
The program is run on pbs system.
Steps to Reproduce
An example of relevant data is attached here: issue.zip