dvlab-research / LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Apache License 2.0
622 stars 39 forks source link

is the LLM weight trainable during stage1-2-3? #49

Closed dragen1860 closed 5 months ago

dragen1860 commented 5 months ago

Hi, dea author: I noticed you have an table describing the weights trainable or not in stage 1-2-3. the vision encoder means EVA and the text decoder means QFormer. However, there is no describe about the LLM vicuna 7b/13b module. image

please let me know the LLM weights is trainable in each stage? thank you.

yanwei-li commented 5 months ago

Hi, LLM is fixed in Stage 1 and open in 2-3.