Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

Does S2 able to unfreeze vit to train? #65

Closed MonolithFoundation closed 1 week ago

MonolithFoundation commented 1 month ago

I think if using s2, and unfreeze vit, the result could be worse, as the s2 split images.

bfshi commented 1 month ago

Hi, the results of VILA-3B-S2 is trained with ViT unfrozen. We didn't observe any negative effect of that.