deepmodeling / dpgen

The deep potential generator to generate a deep-learning based model of interatomic potential energy and force field
https://docs.deepmodeling.com/projects/dpgen/
GNU Lesser General Public License v3.0
285 stars 173 forks source link

the error that ’ raise RuntimeError( RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen‘ occur when dpgen run #1460

Open 12jscvb opened 5 months ago

12jscvb commented 5 months ago

Summary

In the initial phase of 03.fp in the second loop of dpgen, an error message appears as follows: INFO:dpgen:-------------------------iter.000002 task 05-------------------------- INFO:dpgen:-------------------------iter.000002 task 06-------------------------- Traceback (most recent call last): File "/home/combustion/.local/bin/dpgen", line 8, in sys.exit(main()) File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/main.py", line 255, in main args.func(args) File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 5411, in gen_run run_iter(args.PARAM, args.MACHINE) File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 4760, in run_iter make_fp(ii, jdata, mdata) File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3757, in make_fp fp_tasks = _make_fp_vasp_configs(iter_index, jdata) File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3341, in _make_fp_vasp_configs fp_tasks = _make_fp_vasp_inner( File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2566, in _make_fp_vasp_inner ) = _select_by_model_devi_standard( File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2330, in _select_by_model_devi_standard raise RuntimeError( RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen

In addition, I added the initial data set before the start of the loop ,then found that nan appeared in the lcurve.out file generated at 00.train and in the model_devi.out file generated at 02.model. I don't know what to do about this situation. Thank you for help. Best wishes lcurve.txt

model_devi.txt

DP-GEN Version

dpgen v 0.12.0 deepmdv2.2.7

Platform, Python Version, etc

The OS is ubuntu22.04, Before running this loop, I changed the system kernel to use the NVIDIA driver

Details

At the same time, another problem was found. After I installed dpgen, a warning appeared when I checked its version, as follows /usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (4.0.0) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

njzjz commented 5 months ago

Please check whether your training data contains NaN.

12jscvb commented 4 months ago

Please check whether your training data contains NaN. Sorry for my late reply. I have checked the training data and there is no NaN. Could you give me some more suggestions?

robinzyb commented 4 months ago

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? https://github.com/deepmodeling/deepmd-kit/issues/3242

12jscvb commented 4 months ago

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242

Thanks for your advice, when using deepmd v2.2.8, I occasionally come across this situation, I try DeepMD-Kit v2.2.9. What is the reason for this problem? Thanks

njzjz commented 4 months ago

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? https://github.com/deepmodeling/deepmd-kit/issues/3242

NaN appeared in lcurve.out, so it's a totally different issue. I noticed that before NaN, the energy loss unexpectedly increased. The data should contain outliers.

12jscvb commented 4 months ago

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242

NaN 出现在 lcurve.out 中,所以这是一个完全不同的问题。我注意到,在NaN之前,能量损失出乎意料地增加了。数据应包含异常。

Thank you. I noticed that the data set contains script files. In addition, in addition to this case, I encountered that the potential function training was normal in stage 01.train, but a large number of NaN appeared in the model_devi.out file in stage 02.model_devi, and I found that the folder content corresponding to the ‘remote_root’ parameter in the machine-json file was empty. ( the folder Settings are correct), what is the reason for this? could you give some advice ? Thanks