How to train a viscot-13b-336 model, CUDA out of memory？

deepcs233 / Visual-CoT

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Apache License 2.0

98 stars 5 forks source link

How to train a viscot-13b-336 model, CUDA out of memory？ #5

Open LiHaoHN opened 3 months ago

LiHaoHN commented 3 months ago

Hello! I have tried the Visual Instruction Tuning (for viscot-13b-336) as mentioned in readme.md, but there was a problem of CUDA out of memory. I used 8 A100GPU(80G). Does it mean that more than 8 A100 are needed for training viscot-13b-336, or is there a bug? In addition, your readme.md mentioned that 8 A100 GPUs with 80GB memory are needed, however, your paper mentioned All models are trained using 32 × A100s. Is there any misunderstanding? Thank you for your wonderful paper!

deepcs233 commented 2 months ago

Hi! 8 A100 is also enough for training the whole pipeline. Can you try to install FlashAttn or use Zero-2/3 to reduce the GPU memory usage?