b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: File "/data/ceph_11015/ssd/ramseyhuang/UER/uer/utils/config.py", line 21, in <dictcomp>\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b"11.213.17.148: KeyError: 'local_rank=4'\n"
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b"11.213.17.148: KeyError: 'local_rank=1'\n"
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b"11.213.17.148: KeyError: 'local_rank=7'\n"
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b"11.213.17.148: KeyError: 'local_rank=6'\n"
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b"11.213.17.148: KeyError: 'local_rank=2'\n"
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b"11.213.17.148: KeyError: 'local_rank=3'\n"
b"11.213.17.148: KeyError: 'local_rank=0'\n"
b'11.213.17.148: input_args = {k: args_dict[k] for k in [a[2:] for a in sys.argv if a[:2] == "--"]}\n'
b"11.213.17.148: KeyError: 'local_rank=5'\n"
b'11.213.17.148: Killing subprocess 2355\n'
b'11.213.17.148: Killing subprocess 2356\n'
b'11.213.17.148: Killing subprocess 2357\n'
b'11.213.17.148: Killing subprocess 2358\n'
b'11.213.17.148: Killing subprocess 2359\n'
b'11.213.17.148: Killing subprocess 2360\n'
b'11.213.17.148: Killing subprocess 2361\n'
b'11.213.17.148: Killing subprocess 2362\n'
b'11.213.17.148: Traceback (most recent call last):\n'
b'11.213.17.148: File "/data/miniconda3/envs/env-3.6.8/lib/python3.6/runpy.py", line 193, in _run_module_as_main\n'
b'11.213.17.148: "__main__", mod_spec)\n'
b'11.213.17.148: File "/data/miniconda3/envs/env-3.6.8/lib/python3.6/runpy.py", line 85, in _run_code\n'
b'11.213.17.148: exec(code, run_globals)\n'
b'11.213.17.148: File "/data/miniconda3/envs/env-3.6.8/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 171, in <module>\n'
b'11.213.17.148: main()\n'
b'11.213.17.148: File "/data/miniconda3/envs/env-3.6.8/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 161, in main\n'
b'11.213.17.148: sigkill_handler(signal.SIGTERM, None) # not coming back\n'
b'11.213.17.148: File "/data/miniconda3/envs/env-3.6.8/lib/python3.6/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler\n'
b'11.213.17.148: raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)\n'
这是我的shell 脚本
deepspeed的config文件
出现以下的错误
主要问题是
KeyError: 'local_rank=4'\n"
?