Memory occupy with multi GPUs Training

CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

MIT License

4.48k stars 471 forks source link

When I use trlx to fine-tune Flan-T5-Large with single GPU, the memory used is about 11GB; However, when I use accelerate for parallel training, the memory used is 4*16GB! I can't understand why is it. And whether can I use about 11GB for parallel training? Is the problem caused by config? The accelerate config is:

distributed_type: MULTI_GPU 
downcast_bf16: 'no' 
gpu_ids: all 
machine_rank: 0 
main_training_function: main 
mixed_precision: 'no' 
num_machines: 1 
num_processes: 4 
rdzv_backend: static 
same_network: true 
tpu_env: [] 
tpu_use_cluster: false 
tpu_use_sudo: false 
use_cpu: false

Thank you very much for your reply!

CarperAI / trlx

Memory occupy with multi GPUs Training #548