Open 12jscvb opened 5 months ago
Please check whether your training data contains NaN.
Please check whether your training data contains NaN. Sorry for my late reply. I have checked the training data and there is no NaN. Could you give me some more suggestions?
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? https://github.com/deepmodeling/deepmd-kit/issues/3242
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242
Thanks for your advice, when using deepmd v2.2.8, I occasionally come across this situation, I try DeepMD-Kit v2.2.9. What is the reason for this problem? Thanks
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? https://github.com/deepmodeling/deepmd-kit/issues/3242
NaN appeared in lcurve.out, so it's a totally different issue. I noticed that before NaN, the energy loss unexpectedly increased. The data should contain outliers.
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242
NaN 出现在 lcurve.out 中,所以这是一个完全不同的问题。我注意到,在NaN之前,能量损失出乎意料地增加了。数据应包含异常。
Thank you. I noticed that the data set contains script files. In addition, in addition to this case, I encountered that the potential function training was normal in stage 01.train, but a large number of NaN appeared in the model_devi.out file in stage 02.model_devi, and I found that the folder content corresponding to the ‘remote_root’ parameter in the machine-json file was empty. ( the folder Settings are correct), what is the reason for this? could you give some advice ? Thanks
Summary
In the initial phase of 03.fp in the second loop of dpgen, an error message appears as follows: INFO:dpgen:-------------------------iter.000002 task 05-------------------------- INFO:dpgen:-------------------------iter.000002 task 06-------------------------- Traceback (most recent call last): File "/home/combustion/.local/bin/dpgen", line 8, in
sys.exit(main())
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 5411, in gen_run
run_iter(args.PARAM, args.MACHINE)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 4760, in run_iter
make_fp(ii, jdata, mdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3757, in make_fp
fp_tasks = _make_fp_vasp_configs(iter_index, jdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3341, in _make_fp_vasp_configs
fp_tasks = _make_fp_vasp_inner(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2566, in _make_fp_vasp_inner
) = _select_by_model_devi_standard(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2330, in _select_by_model_devi_standard
raise RuntimeError(
RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen
In addition, I added the initial data set before the start of the loop ,then found that nan appeared in the lcurve.out file generated at 00.train and in the model_devi.out file generated at 02.model. I don't know what to do about this situation. Thank you for help. Best wishes lcurve.txt
model_devi.txt
DP-GEN Version
dpgen v 0.12.0 deepmdv2.2.7
Platform, Python Version, etc
The OS is ubuntu22.04, Before running this loop, I changed the system kernel to use the NVIDIA driver
Details
At the same time, another problem was found. After I installed dpgen, a warning appeared when I checked its version, as follows /usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (4.0.0) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "