Cogito2012 / DEAR

[ICCV 2021 Oral] Deep Evidential Action Recognition
Apache License 2.0
121 stars 18 forks source link

How to Train the model by applying multiple GPUs? #13

Closed YeungChiu closed 1 year ago

YeungChiu commented 1 year ago

Hello.

Your work is really amazing. And I run your code successfully by using one GPU. Then I want to run the code by using multiple GPUs, and I got one error these days.

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/train.py', '--local_rank=3', 'configs/recognition/i3d/finetune_ucf101_i3d_edlnokl_avuc_debias.py', '--launcher', 'pytorch', '--work-dir', 'work_dirs/i3d/finetune_ucf101_i3d_edlnokl_avuc_debias', '--validate', '--seed', '0', '--deterministic', '--gpu-ids', '0', '1', '2', '3']' returned non-zero exit status 1.
Experiments finished!

And I run the code with the following command,

bash tools/dist_train.sh configs/recognition/i3d/finetune_ucf101_i3d_edlnokl_avuc_debias.py 4 \
    --work-dir work_dirs/i3d/finetune_ucf101_i3d_edlnokl_avuc_debias \
    --validate \
    --seed 0 \
    --deterministic \
    --gpu-ids 0 1 2 3

So I want to know that how you use multilpe GPUs to train. I'd appreciated it if you can give one example.

Cogito2012 commented 1 year ago

@YeungChiu Thanks for your interest in this work! I didn't try the implementation of the multi-GPU training. Actually, the mmaction2 codebase used by this repo contains the multi-GPU distributed training script here. But to use it, you may need to take a closer look at the details in tools/train.py and all single GPU execution lines, and make some potential adaptations to the multi-GPU setting.

YeungChiu commented 1 year ago

@Cogito2012 Perhaps I should analyze the code. Thank you.