PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
2.88k stars 207 forks source link

Pretrain and Finetune template versions #189

Open xin-li-67 opened 1 month ago

xin-li-67 commented 1 month ago

Hi,

I noticed that the --version arg in both the pretrain and finetune scripts is passed with v1, which is different from the original LLaVA&LLaVA-1.5 and other LLaVA style projects. Do you have any ideas on why you chose to do this?

Best,

Yaxin9Luo commented 2 days ago

Hi, v1 is more like a default setting in the finetune stage of MLLM if you use LLaMA2, as less has trained a more powerful version of llama called vicuna. I am not the authors, hope this can still help you.