Multi GPU training - Githubissues

Hi,

Thank you for your dedicated work.

I am trying to train the transformer using multiple GPUs (8 RTX 3090s) following the instructions in the README: "support multiple GPUs export CUDA_VISIBLE_DEVICES=0,1,2,3".

However, the code only utilizes one GPU, regardless of how many devices I set using export CUDA_VISIBLE_DEVICES. It seems that the current code does not support multi-GPU training. Could you confirm if this is the case? If so, could you provide the code or guidance for enabling multi-GPU training?

Here is my current bash command:

name='trans_name' 
vq_name='vq_name'
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
MULTI_BATCH=8

python3 train_t2m_trans.py  \
    --exp-name ${name} \
    --batch-size $((128*MULTI_BATCH)) \
    --vq-name ${vq_name} \
    --out-dir output/t2m \
    --total-iter $((300000/MULTI_BATCH)) \
    --lr-scheduler $((150000/MULTI_BATCH)) \
    --dataname t2m \
    --eval-iter $((20000/MULTI_BATCH))

Thank you!

exitudio / MMM

Multi GPU training #18