Closed smurf-1119 closed 1 month ago
"I also encountered the same bug use torch run, training Lora for Internvl2 on 8 A800s resulted in an OOM (Out Of Memory) issue."
你好,请问这个问题解决了吗,您可以先试试看微调InternVL2-26B的模型?
Hello, has this issue been resolved? You could try fine-tuning the InternVL2-26B model first.
Due to the inactivity over the past two weeks, this issue might have already been resolved, so I will close it. If you have any further questions, please feel free to reopen it.
你好,请问这个问题解决了吗,您可以先试试看微调InternVL2-26B的模型?
Hello, has this issue been resolved? You could try fine-tuning the InternVL2-26B model first.
Yes, the OOM issue has been resolved. It can be addressed by either allocating more memory during internvl_chat_finetune or by using internvl_chat_pretrain.py for training.
Checklist
Describe the bug
我使用deepspeed框架在16*A100对InternVL2-40B调试时,出现OOM的错误。
Reproduction
Environment
Error traceback