DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
871 stars 60 forks source link

How to finetune on the "VideoLLaMA2-7B" instead of "VideoLLaMA2-7B-Base"? #51

Closed Zeqing-Wang closed 4 months ago

Zeqing-Wang commented 4 months ago

Thanks for your great work! As far as I understand, the script under the "custom" folder is used to fine-tune the base model, without the fine-tuning on the instruct-tuning datasets. What should I do to continue to finetune the model on my own dataset based on the model "VideoLLaMA2-7B"?

Zeqing-Wang commented 4 months ago

Referring to the codes of other projects and my understanding, I directly deleted '--pretrain_mm_mlp_adapter' and replaced model_name_or_path with 'VideoLLaMA2-7B'. But I'm not sure if this is correct for VideoLLaMA2.

lixin4ever commented 4 months ago

Hi, thanks for your interest.

According to this line, you can specify --output_dir with your local ${OUTPUT_DIR} and place the weights of VideoLLaMA2-7B to ${OUTPUT_DIR}/checkpoint-0 (just like you resume training from VideoLLaMA2-7B). This is the easiest way I can come up with to achieve what you said without any code change but you better check if the learning rate scheduler works as expected.

clownrat6 commented 4 months ago

Referring to the codes of other projects and my understanding, I directly deleted '--pretrain_mm_mlp_adapter' and replaced model_name_or_path with 'VideoLLaMA2-7B'. But I'm not sure if this is correct for VideoLLaMA2.

This issue provides continue-finetuning scripts: https://github.com/DAMO-NLP-SG/VideoLLaMA2/issues/40#issuecomment-2216328392