Closed qiuhuiGithub closed 1 year ago
Hi, for pretraining stage, we freeze the LLM and leave the vision encoder and abstractor trainable. The script of train_it_wo_lora.sh is to finetune the LLM while keeping the rest of two visual component frozen.
If you want to finetune the model based on the pre-train ckpt, we recommend you to use lora which requires much more GPU resources.
Hi, for pretraining stage, we freeze the LLM and leave the vision encoder and abstractor trainable. The script of train_it_wo_lora.sh is to finetune the LLM while keeping the rest of two visual component frozen.
Yes, I still try to frozen the LLM and train the vision encoder and abstractor. I change the code in train.py(line 165:169) into
for name, param in model.named_parameters(): if 'vision_model' in name or 'abstractor' in name: param.requires_grad = True else: param.requires_grad = False
But I get element 0 of tensors does not require grad and does not have a grad_fn
error. Is there any other code need to modify?
Hi, I want to pre-train the model on my own a100, and my gpu memory is 80GB. I use the train_it_wo_lora.sh shell and set batch_size=1, but I still get cuda out of memory error. How much memory is required to pre-train the model?