DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

Training model #81

Open Remosy opened 1 year ago

Remosy commented 1 year ago

I tried to perform inference on an x1 GPU 4090 with 24GB, it worked.

Now, I am trying to train this model and reduce the GPU memory usage as much as possible. I found that the for-loop code in modeling_llama.py (Line 542) increases my GPU memory usage every round.

for idx, decoder_layer in enumerate(self.layers):
            if output_hidden_states:
                all_hidden_states += (hidden_states,)
            ................

I read your paper, and it says that the Llama model should be frozen during training. I noticed that you have already assigned the params of self.llama_model to be: param.requires_grad = False. But this is slightly different from the settings of Qformer, where self.llama_model is not in eval() mode.

if freeze_qformer:
            for name, param in self.Qformer.named_parameters():
                param.requires_grad = False
            self.Qformer = self.Qformer.eval()
            self.Qformer.train = disabled_train
            self.query_tokens.requires_grad = False
            logging.info("freeze Qformer")
        logging.info('Loading Q-Former Done')
self.llama_model = LlamaForCausalLM.from_pretrained(
                llama_model,
                torch_dtype=torch.bfloat16,
            )

        for name, param in self.llama_model.named_parameters():
            param.requires_grad = False
        logging.info('Loading LLAMA Done')

Emmm... If I want to perform the second-stage training on my GPU, it would be great to get some advice from you.

tangyipeng100 commented 10 months ago

Can you finetuning it using 4090? thx

Remosy commented 8 months ago

Yes

Can you finetuning it using 4090? thx