ckkelvinchan / RealBasicVSR

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"
Apache License 2.0
900 stars 134 forks source link

GPU num for training #92

Open Xiao-R-Y opened 10 months ago

Xiao-R-Y commented 10 months ago

Thanks for your excellent work, I've got a problem when training with only one GPU, could you please give me some guidance on non-distributed learning commands, thank you.

THE logs are as follows: Training command is /home/zhangyang/envs/anaconda3/envs/realVSR/bin/python -m torch.distributed.launch --nproc_per_node=1 --master_port=21932 /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py configs/realbasicvsr_wogan_c64b20_2x30x8_lr1e-4_300k_reds.py --launcher pytorch. /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmcv/init.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. 'On January 1, 2023, MMCV will release v2.0.0, in which it will remove ' /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting OMP_NUM_THREADS environment variable for each process ' /home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/utils/setup_env.py:43: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting MKL_NUM_THREADS environment variable for each process ' Traceback (most recent call last): File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py", line 171, in main() File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py", line 108, in main cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config))) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmcv/utils/config.py", line 596, in dump f.write(self.pretty_text) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmcv/utils/config.py", line 508, in prettytext text, = FormatCode(text, style_config=yapf_style, verify=True) TypeError: FormatCode() got an unexpected keyword argument 'verify' Traceback (most recent call last): File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/zhangyang/envs/anaconda3/envs/realVSR/bin/python', '-u', '/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py', '--local_rank=0', 'configs/realbasicvsr_wogan_c64b20_2x30x8_lr1e-4_300k_reds.py', '--launcher', 'pytorch']' returned non-zero exit status 1. Traceback (most recent call last): File "/home/zhangyang/envs/anaconda3/envs/realVSR/bin/mim", line 8, in sys.exit(cli()) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mim/commands/train.py", line 111, in cli other_args=other_args) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mim/commands/train.py", line 262, in train cmd, env=dict(os.environ, MASTER_PORT=str(port))) File "/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/subprocess.py", line 328, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/home/zhangyang/envs/anaconda3/envs/realVSR/bin/python', '-m', 'torch.distributed.launch', '--nproc_per_node=1', '--master_port=21932', '/home/zhangyang/envs/anaconda3/envs/realVSR/lib/python3.7/site-packages/mmedit/.mim/tools/train.py', 'configs/realbasicvsr_wogan_c64b20_2x30x8_lr1e-4_300k_reds.py', '--launcher', 'pytorch']' returned non-zero exit status 1.