Boese0601 / MagicDance

[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
https://boese0601.github.io/magicdance/
Other
685 stars 61 forks source link

The training script you provided is for a single GPU, how can I switch it to 8 GPUs? #11

Closed Jeff-Fudan closed 7 months ago

Boese0601 commented 7 months ago

I believe the script already supports DistributedDataParallel and we used 8 A100 GPUs for training as we mentioned in the paper. Have u set the CUDA_VISIBLE_DEVICES and --nproc_per_node to the correct GPU env in ur training script?

See the following script for an example: CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --master_port 10000 --nproc_per_node 4 train_tiktok.py \ This will use GPU 4,5,6,7 and run 4 processes(nproc_per_node=4) for training.

In a similar way, if you'd like to use 8 GPUs: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --master_port 10000 --nproc_per_node 8 train_tiktok.py \