deepmodeling / dpgen

The deep potential generator to generate a deep-learning based model of interatomic potential energy and force field
https://docs.deepmodeling.com/projects/dpgen/
GNU Lesser General Public License v3.0
307 stars 175 forks source link

[BUG] _dpgen autotest run error_<work_path> #1139

Open ZHANG-JINYU-1994 opened 1 year ago

ZHANG-JINYU-1994 commented 1 year ago

Summary

When I ran "dpgen autotest run relax.json machine.json", the error below occurs, unless INCAR or POTCAR is in the same directory as folder confs. Similar error also occurs when dp potential or in.lammps is not in the same directory as folder confs. This error seems to be related to the job submission system (slurm) and the working directory. The json files are attached.

_/home/jzhang/ZrO2/dpgen/autotest/dft --> Runing... 2023-02-12 20:01:01,643 - INFO : info:check_all_finished: False 2023-02-12 20:01:01,643 - INFO : remote path: /home/jzhang/ZrO2/dpgen/autotest/dft/2ec5f7ec0bed6a5e5297c6d0a924c779337ba5a2 Traceback (most recent call last): File "/home/jzhang/anaconda3/envs/deepmd/bin/dpgen", line 8, in sys.exit(main()) File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/main.py", line 185, in main args.func(args) File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/run.py", line 57, in gen_test run_task(args.TASK, args.PARAM, args.MACHINE) File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/run.py", line 34, in run_task run_equi(confs, inter_parameter, mdata) File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/common_equi.py", line 209, in run_equi submission.run_submission() File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 185, in run_submission self.upload_jobs() File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 348, in upload_jobs self.machine.context.upload(self) File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/ssh_context.py", line 505, in upload self._walk_directory(submission.forward_common_files, self.local_root, file_list, directory_list) File "/home/jzhang/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/ssh_context.py", line 474, in _walk_directory raise RuntimeError(f'cannot find upload file {work_path} {jj}') RuntimeError: cannot find upload file /home/jzhang/ZrO2/dpgen/autotest/dft INCAR_

Part of the directory structure is:

├── confs │   ├── ZrO2-m │   │   └── relaxation │   │   ├── INCAR │   │   ├── POTCAR │   │   └── relax_task │   │   ├── CONTCAR │   │   ├── INCAR -> ../INCAR │   │   ├── inter.json │   │   ├── KPOINTS │   │   ├── OSZICAR │   │   ├── OUTCAR │   │   ├── outlog │   │   ├── POSCAR -> ../../POSCAR │   │   ├── POTCAR -> ../POTCAR │   │   ├── result.json │   │   ├── task.json │   │   └── XDATCAR ├── infiles │   ├── INCAR_md │   ├── INCAR_rlx │   ├── INCAR_static │   ├── POTCAR_Ce │   ├── POTCAR_O │   ├── POTCAR_Zr │   ├── ZrO2_a_96.POSCAR ├── machine_raptor.json ├── nohup.out ├── property.json ├── relax_dft.json └── submit.sh

DPGEN Version and Platform

DPGEN 0.11.0 LAMMPS 23 Jun 2022 - Update 1, Deepmd-kit v2.1.5

Job submission and computing cluster configuration

slurm-wlm 20.11.4

Linux version 5.10.0-11-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) https://github.com/deepmodeling/dpgen/pull/1 SMP Debian 5.10.92-1 (2022-01-18)

Expected Behavior

Actual Behavior

Steps to Reproduce

Further Information, Files, and Links setups.zip

kevinwenminion commented 1 year ago

"model": "ZrO2-compressed.pb", "in_lammps": "lammps_input/in.lammps",

This means the model is at "./ZrO2-compressed.pb", in the same directory as folder confs. "in.lammps' file is in "lammps_input" directory and this directory is in the same directory as folder confs.

Is error still coming out by putting the files in the above location?

ZHANG-JINYU-1994 commented 1 year ago

Here is the json file for dft calculation:

{ "structures": ["./confs/ZrO2-P21m"], "interaction": { "type": "vasp", "incar": "./infiles/INCAR_rlx", "potcar_prefix":"./infiles/", "potcars": {"O": "POTCAR_O", "Zr": "POTCAR_Zr"} }, "relaxation": {} }

The structure of the confs directory is: tree ./confs/ZrO2-m/relaxation/ ├── INCAR ├── POTCAR └── relax_task . ├── CONTCAR . ├── INCAR -> ../INCAR . ├── inter.json . ├── KPOINTS . ├── OSZICAR . ├── OUTCAR . ├── outlog . ├── POSCAR -> ../../POSCAR . ├── POTCAR -> ../POTCAR . ├── result.json . ├── task.json . └── XDATCAR

Even when POTCAR and INCAR is in the ./confs/ZrO2-m/relaxation, the error still happens unless POTCAR and INCAR is in the same directory as folder confs.

kevinwenminion commented 1 year ago

I went through the relevant code in common_equi.py and didn't find anything wrong yet.

Could you please try setting "api_version": "0.9" and a different version of dpdispatcher may be used.

ZHANG-JINYU-1994 commented 1 year ago

Maybe it is related to the submission system part. It happens to both md and vasp calculation. Also I do not encounter the similar results using Bohrium.

njzjz commented 1 year ago

It looks that INCAR is listed as the common forward file. I don't get the reason but it seems wrong.