Closed MonolithFoundation closed 1 week ago
I think if using s2, and unfreeze vit, the result could be worse, as the s2 split images.
Hi, the results of VILA-3B-S2 is trained with ViT unfrozen. We didn't observe any negative effect of that.
I think if using s2, and unfreeze vit, the result could be worse, as the s2 split images.