Problems encountered when using multiple Gpus for training

Dear author, I encountered this problem when using two gpu. How to solve this problem? (zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Traceback (most recent call last): File "train.py", line 197, in main() File "train.py", line 72, in main assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus' AssertionError: Batch size should match the number of gpus Traceback (most recent call last): File "train.py", line 197, in main() File "train.py", line 72, in main assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus' AssertionError: Batch size should match the number of gpus Traceback (most recent call last): File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in main() File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/omnisky/zq/bin/python3', '-u', 'train.py', '--local_rank=1', '--launcher', 'pytorch', '--batch_size', '2', '--cfg_file', 'cfgs/kitti_models/CaDDN.yaml']' returned non-zero exit status 1. (zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml^C

TRAILab / CaDDN

Problems encountered when using multiple Gpus for training #95