you build `build_vision_tower` twice?

dvlab-research / LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Apache License 2.0

623 stars 40 forks source link

you build `build_vision_tower` twice? #42

Closed dragen1860 closed 5 months ago

dragen1860 commented 6 months ago

when create the class , build it the first time.

after that, you call initialize_vision_modules in train() in train.py.

if model_args.vision_tower is not None:
    model.get_model().initialize_vision_modules(
        model_args=model_args,
        fsdp=training_args.fsdp,
        max_token=training_args.model_max_length
    )

i found you build it twice, is my understanding correct?

yanwei-li commented 5 months ago

Hi, we use LLaVA as our pipeline. This is the function in LLaVA. Please refer to LLaVA for this issue.