Open Smallwhite999 opened 4 years ago
请讲下你的环境 (python版本, nvcc, gcc等), 然后讲一下是运行什么命令导致报错了, 谢谢!
我安装的时候都是按照下面这个安装的:
训练的命令是按照这个操作的: 这些配置完成之后报错是这样的: python -m torch.distributed.launch \
--nproc_per_node=2 \ tools/train_net.py \ --master_port=$((RANDOM+10000)) \ --config-file configs/MEGA/vid_R_101_c4_MEGA_1x.yaml \ --motion-specific
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory Traceback (most recent call last): File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in
main() File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/****/anaconda3/envs/cvbaseline/bin/python', '-u', 'tools/train_net.py', '--local_rank=1', '--master_port=16269', '--config-file', 'configs/MEGA/vid_R_101_c4_MEGA_1x.yaml', '--motion-specific']' returned non-zero exit status 2. /home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory 并且我跳过这个命令“--motion-specific”的时候报错是这样的: python -m torch.distributed.launch \ --nproc_per_node=2 \ tools/train_net.py \ --master_port=$((RANDOM + 10000)) \ --config-file configs/MEGA/vid_R_101_C4_MEGA_1x.yaml \ OUTPUT_DIR training_dir/MEGA_R_101_1x
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory Traceback (most recent call last): File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, /home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in
main() File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/wangdicong19b/anaconda3/envs/cvbaseline/bin/python', '-u', 'tools/train_net.py', '--local_rank=1', '--master_port=37600', '--config-file', 'configs/MEGA/vid_R_101_C4_MEGA_1x.yaml', 'OUTPUT_DIR', 'training_dir/MEGA_R_101_1x']' returned non-zero exit status 2.
在2020年7月23日 17:21,Yihong Chennotifications@github.com 写道:
请讲下你的环境 (python版本, nvcc, gcc等), 然后讲一下是运行什么命令导致报错了, 谢谢!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
你运行命令的时候切换到mega.pytorch
这个目录下了吗?
我再试一下吧,谢谢您的解答! 在2020年7月24日 10:33,Yihong Chennotifications@github.com 写道:
你运行命令的时候切换到mega.pytorch这个目录下了吗?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
我再试一下吧,谢谢您的解答! 在2020年7月24日 10:33,Yihong Chennotifications@github.com 写道: 你运行命令的时候切换到mega.pytorch这个目录下了吗? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
请问你复现成功了么
并没有 在2020年8月26日 17:20,LauncHnotifications@github.com 写道:
我再试一下吧,谢谢您的解答! 在2020年7月24日 10:33,Yihong Chennotifications@github.com 写道: 你运行命令的时候切换到mega.pytorch这个目录下了吗? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
请问你复现成功了么
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
并没有 在2020年8月26日 17:20,LauncHnotifications@github.com 写道: 我再试一下吧,谢谢您的解答! 在2020年7月24日 10:33,Yihong Chennotifications@github.com 写道: 你运行命令的时候切换到mega.pytorch这个目录下了吗? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. 请问你复现成功了么 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
请问复现成功了吗
在collate_batch.py中 transposed_batch[0][0]的数据结构tensor 为什么用字典操作啊
我安装的时候都是按照下面这个安装的: 训练的命令是按照这个操作的: 这些配置完成之后报错是这样的: python -m torch.distributed.launch \ --nproc_per_node=2 \ tools/train_net.py \ --master_port=$((RANDOM+10000)) \ --config-file configs/MEGA/vid_R_101_c4_MEGA_1x.yaml \ --motion-specific Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. /home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory Traceback (most recent call last): File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in
main() File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home//anaconda3/envs/cvbaseline/bin/python', '-u', 'tools/train_net.py', '--local_rank=1', '--master_port=16269', '--config-file', 'configs/MEGA/vid_R_101_c4_MEGA_1x.yaml', '--motion-specific']' returned non-zero exit status 2. /home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory 并且我跳过这个命令“--motion-specific”的时候报错是这样的: python -m torch.distributed.launch \ --nproc_per_node=2 \ tools/train_net.py \ --master_port=$((RANDOM + 10000)) \ --config-file configs/MEGA/vid_R_101_C4_MEGA_1x.yaml \ OUTPUT_DIR training_dir/MEGA_R_101_1x **** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. /home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory Traceback (most recent call last): File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, /home/wang/anaconda3/envs/cvbaseline/bin/python: can't open file 'tools/train_net.py': [Errno 2] No such file or directory File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in main() File "/home/wang/anaconda3/envs/cvbaseline/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/wangdicong19b/anaconda3/envs/cvbaseline/bin/python', '-u', 'tools/train_net.py', '--local_rank=1', '--master_port=37600', '--config-file', 'configs/MEGA/vid_R_101_C4_MEGA_1x.yaml', 'OUTPUT_DIR', 'training_dir/MEGA_R_101_1x']' returned non-zero exit status 2. 在2020年7月23日 17:21,Yihong Chennotifications@github.com 写道: 请讲下你的环境 (python版本, nvcc, gcc等), 然后讲一下是运行什么命令导致报错了, 谢谢! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
我也遇到了相同的问题。请问,您解决了吗?
还是没有解决23333333333333------------------ 原始邮件 ------------------ @.> 发送时间: 2021年4月8日(星期四) 晚上6:18 @.>; @.**@.>; 主题: Re: [Scalsol/mega.pytorch] 关于您的论文复现 (#41)
换了个轮子已经做完啦。。。建议你别再费力气了。应该有一些问题------------------ 原始邮件 ------------------ @.> 发送时间: 2021年4月8日(星期四) 晚上9:04 @.>; @.**@.>; 主题: Re: [Scalsol/mega.pytorch] 关于您的论文复现 (#41)
是3.7------------------ 原始邮件 ------------------ @.> 发送时间: 2021年4月8日(星期四) 晚上9:08 @.>; @.**@.>; 主题: Re: [Scalsol/mega.pytorch] 关于您的论文复现 (#41)
其实完全按照作者配置,也会出现问题,我已经放弃啦。 ------------------ 原始邮件 ------------------ @.> 发送时间: 2021年4月8日(星期四) 晚上9:08 @.>; @.**@.>; 主题: Re: [Scalsol/mega.pytorch] 关于您的论文复现 (#41)
是师兄的一个框架,你懂的。我已经毕业了,不太记得清了,抱歉应该帮不了你了。 ------------------ 原始邮件 ------------------ @.> 发送时间: 2021年4月8日(星期四) 晚上9:14 @.>; @.**@.>; 主题: Re: [Scalsol/mega.pytorch] 关于您的论文复现 (#41)
作者您好,我是按照 README.md 上面的配置的,训练和推理也是按照那个里面的命令执行的,但是还是报错,是除了按照 README.md 里面的配资之外还要按照哪个配置啊?劳烦解答一下,谢谢!