Open yuanyaaa opened 1 year ago
Hi @yuanyaaa! Sorry for the late reply 😞. I can't reproduce this behaviour with this config accelerate launch --config_file configs/accelerate/zero2-bf16.yaml
.
https://wandb.ai/carperai/trlx/reports/Memory-occupy-with-multi-GPUs-Training-548---Vmlldzo1MjUzMjMy
Also consider using ZeRO3 if you want to save more memory, or else you may want to lower these options in the config https://github.com/CarperAI/trlx#configure-hyperparameters
When I use trlx to fine-tune Flan-T5-Large with single GPU, the memory used is about 11GB; However, when I use accelerate for parallel training, the memory used is 4*16GB! I can't understand why is it. And whether can I use about 11GB for parallel training? Is the problem caused by config? The accelerate config is:
Thank you very much for your reply!