deepmodeling / dpgen

The deep potential generator to generate a deep-learning based model of interatomic potential energy and force field
https://docs.deepmodeling.com/projects/dpgen/
GNU Lesser General Public License v3.0
296 stars 173 forks source link

add one sentence "#SBATCH -A TG-DMR160007" #451

Closed tfcao888666 closed 3 years ago

tfcao888666 commented 3 years ago

Hi All, I want add subject name "#SBATCH -A TG-DMR160007" to the .sub file, so that I could submit job. Could you tell me how to add on the machine fille. Here is my machine. Thank you! { "deepmd_path": "~/miniconda3/bin/dp", "train_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "_comment": "that's all" },

"lmp_command":      "~/miniconda3/bin/lmp",
"model_devi_group_size":    1,
"_comment":         "model_devi on localhost",
"model_devi_machine":       {
    "batch": "slurm",
    "work_path" :   "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini",
    "_comment" :    "that's all"
},
"_comment": " if use GPU, numb_nodes(nn) should always be 1 ",
"_comment": " if numb_nodes(nn) = 1 multi-threading rather than mpi is assumed",
"model_devi_resources":     {
    "numb_node":    1,
    "task_per_node":64,
    "source_list":  ["~/miniconda3/bin/activate" ],
    "module_list":  [ ],
    "time_limit":   "2:00:0",
    "mem_limit":    32,
    "partition" : "compute",
    "_comment":     "that's all"
},

"_comment":         "fp on localhost ",
"fp_command":       "mpirun -np 64  /home/tfcao/vasp_bin/regular/vasp",
"fp_group_size":    1,
"fp_machine":       {
    "batch": "slurm",
    "work_path" :   "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini",
    "_comment" :    "that's all"
},
"fp_resources":     {
    "numb_node":    1,
    "task_per_node":64,
    "numb_gpu":     0,
    "exclude_list" : [],
    "source_list":  [],
    "module_list":  [],
    "with_mpi" : false,
    "time_limit":   "2:00:0",
    "partition" : "compute",
    "_comment":     "that's all"
},
"_comment":         " that's all "

} ~

Summary

Detailed Description

Further Information, Files, and Links

njzjz commented 3 years ago

See #367, and custom_flags is provided in #368.

tfcao888666 commented 3 years ago

Hi Jinze, Thank you for the response. I have chance, "

"deepmd_path":      "~/miniconda3/bin/dp",
"train_machine":    {
    "batch": "slurm",
    "work_path" :

"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ],

On Thu, Jul 1, 2021 at 5:03 PM Jinzhe Zeng @.***> wrote:

See #367 https://github.com/deepmodeling/dpgen/issues/367, and custom_flags is provided in #368 https://github.com/deepmodeling/dpgen/pull/368.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872620723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTANYPMXMKLZ4ZVQXC3TVT63LANCNFSM47VVMDOQ .

njzjz commented 3 years ago

The correct one should be "-A TG-DMR160007" instead of "TG-DMR160007".

tfcao888666 commented 3 years ago

Hi Jinze, It still does not work

this is the final file "#!/bin/bash -l

SBATCH -N 1

SBATCH --ntasks-per-node=64

SBATCH -t 2:00:0

SBATCH --partition=compute

cd sys-0002-0002-0006 test $? -ne 0 && exit 1 if [ ! -f tag_0_finished ] ;then mpirun -np 64 /home/tfcao/vasp_bin/regular/vasp 1>> log 2>> err if test $? -ne 0; then exit 1; else touch tag_0_finished; fi fi cd /expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini/8078abd8-d0c7-4e26-9cec-f5e6ea0f4420 test $? -ne 0 && exit 1 wait touch 8078abd8-d0c7-4e26-9cec-f5e6ea0f4420_tag_finished ~

Here is the machine file. "

"{ "deepmd_path": "~/miniconda3/bin/dp", "train_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ], "custom_flags": ["-A TG-DMR160007"], "time_limit": "2:00:0", "mem_limit": 32, "_comment": "that's all" },

"lmp_command":      "~/miniconda3/bin/lmp",
"model_devi_group_size":    1,
"_comment":         "model_devi on localhost",
"model_devi_machine":       {
    "batch": "slurm",
    "work_path" :

"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "_comment": " if use GPU, numb_nodes(nn) should always be 1 ", "_comment": " if numb_nodes(nn) = 1 multi-threading rather than mpi is assumed", "model_devi_resources": { "numb_node": 1, "task_per_node":64, "source_list": ["~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "partition" : "compute", "custom_flags": ["-A TG-DMR160007"], "_comment": "that's all" },

"_comment":         "fp on localhost ",
"fp_command":       "mpirun -np 64  /home/tfcao/vasp_bin/regular/vasp",
"fp_group_size":    1,
"fp_machine":       {
    "batch": "slurm",
    "work_path" :

"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "fp_resources": { "numb_node": 1, "task_per_node":64, "numb_gpu": 0, "exclude_list" : [], "source_list": [], "module_list": [], "with_mpi" : false, "time_limit": "2:00:0", "partition" : "compute", "_comment": "that's all" }, "_comment": " that's all " } "

On Thu, Jul 1, 2021 at 8:05 PM Jinzhe Zeng @.***> wrote:

The correct one should be "-A TG-DMR160007" instead of "TG-DMR160007".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872678087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTGDPXUCEGR6ZNKDDCTTVUUHBANCNFSM47VVMDOQ .

njzjz commented 3 years ago

You added it to model_devi_resources but you are running a fp task?

tfcao888666 commented 3 years ago

Hi Jinze, I changed it " "deepmd_path": "~/miniconda3/bin/dp", "train_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "custom_flags": "-A TG-DMR160007", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "_comment": "that's all" },

"lmp_command":      "~/miniconda3/bin/lmp",
"model_devi_group_size":    1,
"_comment":         "model_devi on localhost",
"model_devi_machine":       {
    "batch": "slurm",
    "custom_flags": "-A TG-DMR160007",
    "work_path" :

"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "_comment": " if use GPU, numb_nodes(nn) should always be 1 ", "_comment": " if numb_nodes(nn) = 1 multi-threading rather than mpi is assumed", "model_devi_resources": { "numb_node": 1, "task_per_node":64, "source_list": ["~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "partition" : "compute", "custom_flags": "-A TG-DMR160007", "_comment": "that's all" },

"_comment":         "fp on localhost ",
"fp_command":       "mpirun -np 64  /home/tfcao/vasp_bin/regular/vasp",
"fp_group_size":    1,
"fp_machine":       {
    "batch": "slurm",
     "custom_flags": "-A TG-DMR160007",
     "work_path" :

"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "fp_resources": { "numb_node": 1, "task_per_node":64, "numb_gpu": 0, "exclude_list" : [], "source_list": [], "module_list": [], "with_mpi" : false, "time_limit": "2:00:0", "partition" : "compute", "_comment": "that's all" }, "_comment": " that's all " } " And the code: " temp_exclude = "" for ii in res['exclude_list'] : temp_exclude += ii temp_exclude += "," temp_exclude = temp_exclude[:-1] ret += '#SBATCH --exclude=%s \n' % temp_exclude for flag in res.get('custom_flags', []): ret += '#SBATCH %s \n' % flag ret += "\n" " Could you help me check again! Thank you! Best regards!' Tengfei

On Thu, Jul 1, 2021 at 8:49 PM Jinzhe Zeng @.***> wrote:

You added it to model_devi_resources but you are running a fp task?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872691647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTBLRR2LOYIRXPWSSGTTVUZOLANCNFSM47VVMDOQ .

njzjz commented 3 years ago

I don't understand your change... In https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872688798, it seems that you only added it to model_devi_resources (but not fp_resources). However, you are running a fp task, right? You should add custom_flags: ["-A TG-DMR160007"] to fp_resources.

tfcao888666 commented 3 years ago

Hi Jinzhe, I have figured out. Thank you!

On Thu, Jul 1, 2021 at 10:45 PM Jinzhe Zeng @.***> wrote:

I don't understand your change... In #451 (comment) https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872688798, it seems that you only added it to model_devi_resources (but not fp_resources). However, you are running a fp task, right? You should add custom_flags: ["-A TG-DMR160007"] to fp_resources.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872733607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTB22RCI6T2OYGJ56JDTVVHA3ANCNFSM47VVMDOQ .