deepmodeling / dpgen

The deep potential generator to generate a deep-learning based model of interatomic potential energy and force field
https://docs.deepmodeling.com/projects/dpgen/
GNU Lesser General Public License v3.0
292 stars 173 forks source link

find too many unsuccessfully terminated jobs #311

Closed wzb198910ab closed 4 years ago

wzb198910ab commented 4 years ago

When I run 'dpgen run para.json machine.json', I met the error as following:

OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-11 OMP: Info #214: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #156: KMP_AFFINITY: 12 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #285: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket". OMP: Info #285: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket". OMP: Info #285: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core". OMP: Info #285: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core". OMP: Info #191: KMP_AFFINITY: 1 socket x 6 cores/socket x 2 threads/core (6 total cores) OMP: Info #216: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to socket 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to socket 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to socket 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to socket 0 core 4 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to socket 0 core 4 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to socket 0 core 5 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to socket 0 core 5 thread 1 OMP: Info #252: KMP_AFFINITY: pid 7001 tid 7001 thread 0 bound to OS proc set 0 DeepModeling


Version: 0.8.1.dev2+g413d945 Date: Aug-09-2020 Path: /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen

Dependency

 numpy     1.18.5   /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/numpy
dpdata     0.1.17   /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpdata

pymatgen 2020.8.3 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/pymatgen monty 3.0.4 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/monty ase 3.20.0 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/ase paramiko 2.7.1 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/paramiko custodian 2020.4.27 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/custodian

Reference

Please cite: Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.

Description

INFO : start running INFO : =============================iter.000000============================== INFO : -------------------------iter.000000 task 00-------------------------- INFO : -------------------------iter.000000 task 01-------------------------- INFO : new submission of 462315b9-50c0-48d6-98b0-67bae1db53ce for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730 INFO : new submission of 43b39538-8222-49a7-b9a8-b399d93d1127 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72 INFO : new submission of f4dc3041-215b-46a4-ae3f-67c64e272f66 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c INFO : new submission of f49ef967-0c47-41c9-b960-5ca26efc65c6 for chunk 221407c03ae5c73109cce71d27e24637824f3333 INFO : job f49ef967-0c47-41c9-b960-5ca26efc65c6 finished INFO : job 462315b9-50c0-48d6-98b0-67bae1db53ce finished INFO : job 43b39538-8222-49a7-b9a8-b399d93d1127 finished INFO : job f4dc3041-215b-46a4-ae3f-67c64e272f66 finished INFO : -------------------------iter.000000 task 02-------------------------- INFO : -------------------------iter.000000 task 03-------------------------- INFO : -------------------------iter.000000 task 04-------------------------- INFO : new submission of 782869a5-6cfa-47c0-ac09-9ab2775ce2d8 for chunk 5934a71cd58aa3719d6b4ab5368b50ce8bf3a54c INFO : new submission of 5dc9c8d4-f832-4d11-9428-d0dcbbad79b1 for chunk 9b04ed091d6b748e0740bfb91043e477e45dad09 INFO : job 782869a5-6cfa-47c0-ac09-9ab2775ce2d8 finished INFO : job 5dc9c8d4-f832-4d11-9428-d0dcbbad79b1 finished INFO : -------------------------iter.000000 task 05-------------------------- INFO : -------------------------iter.000000 task 06-------------------------- INFO : system 000 candidate : 818 in 5050 16.20 % INFO : system 000 failed : 4177 in 5050 82.71 % INFO : system 000 accurate : 55 in 5050 1.09 % INFO : system 000 accurate_ratio: 0.0109 thresholds: 1.0000 and 1.0000 eff. task min and max -1 20 number of fp tasks: 20 INFO : system 002 candidate : 282 in 5050 5.58 % INFO : system 002 failed : 4707 in 5050 93.21 % INFO : system 002 accurate : 61 in 5050 1.21 % INFO : system 002 accurate_ratio: 0.0121 thresholds: 1.0000 and 1.0000 eff. task min and max -1 20 number of fp tasks: 20 INFO : system 004 candidate : 594 in 5050 11.76 % INFO : system 004 failed : 4386 in 5050 86.85 % INFO : system 004 accurate : 70 in 5050 1.39 % INFO : system 004 accurate_ratio: 0.0139 thresholds: 1.0000 and 1.0000 eff. task min and max -1 20 number of fp tasks: 20 INFO : -------------------------iter.000000 task 07-------------------------- INFO : new submission of 2baf1807-5ba8-4362-830f-88fe0b216943 for chunk 16cc50909d40073045aaf94a03cbf9f28586cfcf INFO : job 2baf1807-5ba8-4362-830f-88fe0b216943 finished INFO : -------------------------iter.000000 task 08-------------------------- INFO : failed tasks: 0 in 60 0.00 % INFO : failed frame: 60 in 60 100.00 %

Traceback (most recent call last): File "/home/wzb198910/Software/anaconda3/bin/dpgen", line 8, in sys.exit(main()) File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/main.py", line 182, in main args.func(args) File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 2311, in gen_run run_iter (args.PARAM, args.MACHINE) File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 2300, in run_iter post_fp (ii, jdata) File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 2168, in post_fp post_fp_vasp(iter_index, jdata) File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 1934, in post_fp_vasp raise RuntimeError("find too many unsuccessfully terminated jobs") RuntimeError: find too many unsuccessfully terminated jobs

How can I solve this problem? Thanks.

PS: Ubuntu 20.04 + python 3.8 + deepmd-kit 1.2.0 + dpdata 0.1.17 + dpgen 0.8.1.dev2+g413d945

njzjz commented 4 years ago

The OMP information is not error. The error here is that all your VASP jobs were terminated. You need to find the reason from the log.

wzb198910ab commented 4 years ago

Problem have been solved by change the value of NSW to zero in INCAR file.