Vision-CAIR / LongVU

https://vision-cair.github.io/LongVU
248 stars 15 forks source link

Vision-CAIR/LongVU_Llama3_2_1B exists #7

Open zoldaten opened 3 days ago

zoldaten commented 3 days ago

i saw https://huggingface.co/Vision-CAIR/LongVU_Llama3_2_1B exists . Is it image or video part ? could it be combined with LongVU_Llama3_2_3B ? (image or video) and what hardware requirements ?

xiaoqian-shen commented 3 days ago

LongVU_Llama3_2_1B model is the video llm with Llama3_2_1B language backbone. Similarly, LongVU_Llama3_2_3B is with the 3B language backbone Llama3_2_3B. What do you mean by combing both model?