Dear author, I encountered this problem when using two gpu. How to solve this problem?
(zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last):
File "train.py", line 197, in
main()
File "train.py", line 72, in main
assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus'
AssertionError: Batch size should match the number of gpus
Traceback (most recent call last):
File "train.py", line 197, in
main()
File "train.py", line 72, in main
assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus'
AssertionError: Batch size should match the number of gpus
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/omnisky/zq/bin/python3', '-u', 'train.py', '--local_rank=1', '--launcher', 'pytorch', '--batch_size', '2', '--cfg_file', 'cfgs/kitti_models/CaDDN.yaml']' returned non-zero exit status 1.
(zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml^C
Dear author, I encountered this problem when using two gpu. How to solve this problem? (zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last): File "train.py", line 197, in
main()
File "train.py", line 72, in main
assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus'
AssertionError: Batch size should match the number of gpus
Traceback (most recent call last):
File "train.py", line 197, in
main()
File "train.py", line 72, in main
assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus'
AssertionError: Batch size should match the number of gpus
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/omnisky/zq/bin/python3', '-u', 'train.py', '--local_rank=1', '--launcher', 'pytorch', '--batch_size', '2', '--cfg_file', 'cfgs/kitti_models/CaDDN.yaml']' returned non-zero exit status 1.
(zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml^C