Closed tfcao888666 closed 3 years ago
See #367, and custom_flags
is provided in #368.
Hi Jinze, Thank you for the response. I have chance, "
"deepmd_path": "~/miniconda3/bin/dp",
"train_machine": {
"batch": "slurm",
"work_path" :
"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ],
"custom_flags": ["TG-DMR160007"],* "time_limit": "2:00:0", "mem_limit": 32, "_comment": "that's all" },
"lmp_command": "~/miniconda3/bin/lmp", "model_devi_group_size": 1, "_comment": "model_devi on localhost", "model_devi_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "_comment": " if use GPU, numb_nodes(nn) should always be 1 ", "_comment": " if numb_nodes(nn) = 1 multi-threading rather than mpi is assumed", "model_devi_resources": { "numb_node": 1, "task_per_node":64, "source_list": ["~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "partition" : "compute",
"_comment": "fp on localhost ", "fp_command": "mpirun -np 64 /home/tfcao/vasp_bin/regular/vasp", "fp_group_size": 1, "fp_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "fp_resources": { "numb_node": 1, "task_per_node":64, "numb_gpu": 0, "exclude_list" : [], "source_list": [], "module_list": [], "with_mpi" : false, "time_limit": "2:00:0", "partition" : "compute", "_comment": "that's all" }, "_comment": " that's all " } " It seems that is does not work. Could you have a look? Thank you!
On Thu, Jul 1, 2021 at 5:03 PM Jinzhe Zeng @.***> wrote:
See #367 https://github.com/deepmodeling/dpgen/issues/367, and custom_flags is provided in #368 https://github.com/deepmodeling/dpgen/pull/368.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872620723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTANYPMXMKLZ4ZVQXC3TVT63LANCNFSM47VVMDOQ .
The correct one should be "-A TG-DMR160007"
instead of "TG-DMR160007"
.
Hi Jinze, It still does not work
this is the final file "#!/bin/bash -l
cd sys-0002-0002-0006 test $? -ne 0 && exit 1 if [ ! -f tag_0_finished ] ;then mpirun -np 64 /home/tfcao/vasp_bin/regular/vasp 1>> log 2>> err if test $? -ne 0; then exit 1; else touch tag_0_finished; fi fi cd /expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini/8078abd8-d0c7-4e26-9cec-f5e6ea0f4420 test $? -ne 0 && exit 1 wait touch 8078abd8-d0c7-4e26-9cec-f5e6ea0f4420_tag_finished ~
Here is the machine file. "
"{ "deepmd_path": "~/miniconda3/bin/dp", "train_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ], "custom_flags": ["-A TG-DMR160007"], "time_limit": "2:00:0", "mem_limit": 32, "_comment": "that's all" },
"lmp_command": "~/miniconda3/bin/lmp",
"model_devi_group_size": 1,
"_comment": "model_devi on localhost",
"model_devi_machine": {
"batch": "slurm",
"work_path" :
"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "_comment": " if use GPU, numb_nodes(nn) should always be 1 ", "_comment": " if numb_nodes(nn) = 1 multi-threading rather than mpi is assumed", "model_devi_resources": { "numb_node": 1, "task_per_node":64, "source_list": ["~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "partition" : "compute", "custom_flags": ["-A TG-DMR160007"], "_comment": "that's all" },
"_comment": "fp on localhost ",
"fp_command": "mpirun -np 64 /home/tfcao/vasp_bin/regular/vasp",
"fp_group_size": 1,
"fp_machine": {
"batch": "slurm",
"work_path" :
"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "fp_resources": { "numb_node": 1, "task_per_node":64, "numb_gpu": 0, "exclude_list" : [], "source_list": [], "module_list": [], "with_mpi" : false, "time_limit": "2:00:0", "partition" : "compute", "_comment": "that's all" }, "_comment": " that's all " } "
On Thu, Jul 1, 2021 at 8:05 PM Jinzhe Zeng @.***> wrote:
The correct one should be "-A TG-DMR160007" instead of "TG-DMR160007".
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872678087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTGDPXUCEGR6ZNKDDCTTVUUHBANCNFSM47VVMDOQ .
You added it to model_devi_resources
but you are running a fp
task?
Hi Jinze, I changed it " "deepmd_path": "~/miniconda3/bin/dp", "train_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "custom_flags": "-A TG-DMR160007", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "_comment": "that's all" },
"lmp_command": "~/miniconda3/bin/lmp",
"model_devi_group_size": 1,
"_comment": "model_devi on localhost",
"model_devi_machine": {
"batch": "slurm",
"custom_flags": "-A TG-DMR160007",
"work_path" :
"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "_comment": " if use GPU, numb_nodes(nn) should always be 1 ", "_comment": " if numb_nodes(nn) = 1 multi-threading rather than mpi is assumed", "model_devi_resources": { "numb_node": 1, "task_per_node":64, "source_list": ["~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "partition" : "compute", "custom_flags": "-A TG-DMR160007", "_comment": "that's all" },
"_comment": "fp on localhost ",
"fp_command": "mpirun -np 64 /home/tfcao/vasp_bin/regular/vasp",
"fp_group_size": 1,
"fp_machine": {
"batch": "slurm",
"custom_flags": "-A TG-DMR160007",
"work_path" :
"/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "fp_resources": { "numb_node": 1, "task_per_node":64, "numb_gpu": 0, "exclude_list" : [], "source_list": [], "module_list": [], "with_mpi" : false, "time_limit": "2:00:0", "partition" : "compute", "_comment": "that's all" }, "_comment": " that's all " } " And the code: " temp_exclude = "" for ii in res['exclude_list'] : temp_exclude += ii temp_exclude += "," temp_exclude = temp_exclude[:-1] ret += '#SBATCH --exclude=%s \n' % temp_exclude for flag in res.get('custom_flags', []): ret += '#SBATCH %s \n' % flag ret += "\n" " Could you help me check again! Thank you! Best regards!' Tengfei
On Thu, Jul 1, 2021 at 8:49 PM Jinzhe Zeng @.***> wrote:
You added it to model_devi_resources but you are running a fp task?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872691647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTBLRR2LOYIRXPWSSGTTVUZOLANCNFSM47VVMDOQ .
I don't understand your change... In https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872688798, it seems that you only added it to model_devi_resources
(but not fp_resources
). However, you are running a fp
task, right? You should add custom_flags: ["-A TG-DMR160007"]
to fp_resources
.
Hi Jinzhe, I have figured out. Thank you!
On Thu, Jul 1, 2021 at 10:45 PM Jinzhe Zeng @.***> wrote:
I don't understand your change... In #451 (comment) https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872688798, it seems that you only added it to model_devi_resources (but not fp_resources). However, you are running a fp task, right? You should add custom_flags: ["-A TG-DMR160007"] to fp_resources.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/dpgen/issues/451#issuecomment-872733607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHBQTB22RCI6T2OYGJ56JDTVVHA3ANCNFSM47VVMDOQ .
Hi All, I want add subject name "#SBATCH -A TG-DMR160007" to the .sub file, so that I could submit job. Could you tell me how to add on the machine fille. Here is my machine. Thank you! { "deepmd_path": "~/miniconda3/bin/dp", "train_machine": { "batch": "slurm", "work_path" : "/expanse/lustre/scratch/tfcao/temp_project/batis3-dp/ini", "_comment" : "that's all" }, "train_resources": { "numb_node": 1, "task_per_node":64, "partition" : "compute", "exclude_list" : [], "source_list": [ "~/miniconda3/bin/activate" ], "module_list": [ ], "time_limit": "2:00:0", "mem_limit": 32, "_comment": "that's all" },
} ~
Summary
Detailed Description
Further Information, Files, and Links