dvlab-research / LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Apache License 2.0
622 stars 39 forks source link

HF model format : vlm weights not in llama-vid-7b-full-336 #81

Open nileshkokane01 opened 2 months ago

nileshkokane01 commented 2 months ago

@yanwei-li ,

I am trying to convert the the llama-vid models to HF - https://github.com/huggingface/transformers/pull/29971 but I do not find the qformer weights , specifically the lines searches for keys related to vlm and fails for llama-vid-7b-full-336.

Can you please assist me with this , and let me know where can I find the weights? Although I see the Bert -Encoder/Tokenizer is been initialized by this line , but can't find the projection weights.