X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

cuda out of memory when pre-train the model #72

Closed qiuhuiGithub closed 1 year ago

qiuhuiGithub commented 1 year ago

Hi, I want to pre-train the model on my own a100, and my gpu memory is 80GB. I use the train_it_wo_lora.sh shell and set batch_size=1, but I still get cuda out of memory error. How much memory is required to pre-train the model?

MAGAer13 commented 1 year ago

Hi, for pretraining stage, we freeze the LLM and leave the vision encoder and abstractor trainable. The script of train_it_wo_lora.sh is to finetune the LLM while keeping the rest of two visual component frozen.

MAGAer13 commented 1 year ago

If you want to finetune the model based on the pre-train ckpt, we recommend you to use lora which requires much more GPU resources.

qiuhuiGithub commented 1 year ago

Hi, for pretraining stage, we freeze the LLM and leave the vision encoder and abstractor trainable. The script of train_it_wo_lora.sh is to finetune the LLM while keeping the rest of two visual component frozen.

Yes, I still try to frozen the LLM and train the vision encoder and abstractor. I change the code in train.py(line 165:169) into for name, param in model.named_parameters(): if 'vision_model' in name or 'abstractor' in name: param.requires_grad = True else: param.requires_grad = False But I get element 0 of tensors does not require grad and does not have a grad_fn error. Is there any other code need to modify?