CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.48k stars 471 forks source link

Memory occupy with multi GPUs Training #548

Open yuanyaaa opened 1 year ago

yuanyaaa commented 1 year ago

When I use trlx to fine-tune Flan-T5-Large with single GPU, the memory used is about 11GB; However, when I use accelerate for parallel training, the memory used is 4*16GB! I can't understand why is it. And whether can I use about 11GB for parallel training? Is the problem caused by config? The accelerate config is:

distributed_type: MULTI_GPU 
downcast_bf16: 'no' 
gpu_ids: all 
machine_rank: 0 
main_training_function: main 
mixed_precision: 'no' 
num_machines: 1 
num_processes: 4 
rdzv_backend: static 
same_network: true 
tpu_env: [] 
tpu_use_cluster: false 
tpu_use_sudo: false 
use_cpu: false 

Thank you very much for your reply!

maxreciprocate commented 1 year ago

Hi @yuanyaaa! Sorry for the late reply 😞. I can't reproduce this behaviour with this config accelerate launch --config_file configs/accelerate/zero2-bf16.yaml. https://wandb.ai/carperai/trlx/reports/Memory-occupy-with-multi-GPUs-Training-548---Vmlldzo1MjUzMjMy

Also consider using ZeRO3 if you want to save more memory, or else you may want to lower these options in the config https://github.com/CarperAI/trlx#configure-hyperparameters