exitudio / MMM

Official repository for "MMM: Generative Masked Motion Model"
https://exitudio.github.io/MMM-page/
79 stars 5 forks source link

Multi GPU training #18

Closed hyunbin70 closed 4 months ago

hyunbin70 commented 4 months ago

Hi,

Thank you for your dedicated work.

I am trying to train the transformer using multiple GPUs (8 RTX 3090s) following the instructions in the README: "support multiple GPUs export CUDA_VISIBLE_DEVICES=0,1,2,3".

However, the code only utilizes one GPU, regardless of how many devices I set using export CUDA_VISIBLE_DEVICES. It seems that the current code does not support multi-GPU training. Could you confirm if this is the case? If so, could you provide the code or guidance for enabling multi-GPU training?

Here is my current bash command:

name='trans_name' 
vq_name='vq_name'
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
MULTI_BATCH=8

python3 train_t2m_trans.py  \
    --exp-name ${name} \
    --batch-size $((128*MULTI_BATCH)) \
    --vq-name ${vq_name} \
    --out-dir output/t2m \
    --total-iter $((300000/MULTI_BATCH)) \
    --lr-scheduler $((150000/MULTI_BATCH)) \
    --dataname t2m \
    --eval-iter $((20000/MULTI_BATCH))

Thank you!

hyunbin70 commented 4 months ago

Sorry, my bad. It works in multi-gpu settings. (I modified dataloader for training on custom dataset, which caused some memory bottlenecks and it prevents GPU memory I/O)