dvlab-research / LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Apache License 2.0
622 stars 39 forks source link

About mm_projector loading issue #90

Open rubylan opened 1 month ago

rubylan commented 1 month ago

Hi there,

Great work!

Just encountered a small issue when learning about the implementation. When I check the repo (link here), 'mm_projector' is not included in the list of 'trainable_module', which means the module of mm_projector will not load its weights from the checkpoint (passing by --model_name_or_path) during stage-2/-3 or inference mode, right?

I read through the whole implementation carefully and found nowhere for the loading. I am wondering if there is anything I've missed since it works well and should not have such a factual error.

Looking forward to your reply and thanks in advanced :))

Best, Ruby

Einstone-rose commented 1 month ago

You can refer to Line 78-90 for details. Actually, mm_projector is the fully-connected network, and it is pretrained from scratch and as a member of module pretrain_mm_mlp_adapter.