TRAILab / CaDDN

Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021 Oral)
Apache License 2.0
359 stars 62 forks source link

Problems encountered when using multiple Gpus for training #95

Open 123456789live opened 2 years ago

123456789live commented 2 years ago

Dear author, I encountered this problem when using two gpu. How to solve this problem? (zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Traceback (most recent call last): File "train.py", line 197, in main() File "train.py", line 72, in main assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus' AssertionError: Batch size should match the number of gpus Traceback (most recent call last): File "train.py", line 197, in main() File "train.py", line 72, in main assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus' AssertionError: Batch size should match the number of gpus Traceback (most recent call last): File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in main() File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/omnisky/zq/bin/python3', '-u', 'train.py', '--local_rank=1', '--launcher', 'pytorch', '--batch_size', '2', '--cfg_file', 'cfgs/kitti_models/CaDDN.yaml']' returned non-zero exit status 1. (zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml^C

fgqile commented 2 years ago

i MET IT TOO