About the choices in LLaVA+S^2 implementation

jungle-gym-ac commented 2 months ago

Great work! I've read the paper and it seems the LLaVA+S^2 is implemented with OpenCLIP VIsion Encoder, and the LLM is finetuned with LoRA. However, the LLaVA baseline you compared with is implemented with OpenAI-CLIP Vision Encoder, and the LLM is full-finetuned(without LoRA).

If I'm right, I just wonder if you have tried using the same Vision Encoder, or full-finetuning the LLM, and what are the results of this setting? Thank you.

bfshi commented 2 months ago

Hi @jungle-gym-ac, yeah good question. In the scaling experiment on llava (Fig 3 in the paper), all the models including the baselines use openclip. The experiment of comparing llava-s2 to official llava (Table 11 in Appendix) uses OpenAI clip.

And you are right, all the models we trained on llava use lora while the official llava checkpoint we compare to uses full fine tuning. According to the official llava repo, the performance of llava with ft/lora doesn't differ much on average, but yeah comparing to the official checkpoint Lora would be fairer. We will include this in a later version of the paper. And we didn't try llava-s2 with ft.

bfshi commented 2 months ago

The training recipe is the exact same one as LLaVA. Loss of 2.5 seems weird. That is almost the same loss as an untrained model. Did you change anything from the llava repo that may cause some unexpected behaviors?

On Tue, May 28, 2024 at 12:39 AM zuijiang @.***> wrote:

@bfshi https://github.com/bfshi I'm wondering is the pretraining data used in LLaVA+S^2 the same as the original LLaVA? cuz I tried pretraining LLaVA with S^2, and the loss is much bitter than LLaVA (2.5 compared with 0.7)

— Reply to this email directly, view it on GitHub https://github.com/bfshi/scaling_on_scales/issues/10#issuecomment-2134332486, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIE4LJT4KE23HMRRQMTIW3ZEQDBTAVCNFSM6AAAAABIJUQK22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGMZTENBYGY . You are receiving this because you were mentioned.Message ID: @.***>

bfshi / scaling_on_scales

About the choices in LLaVA+S^2 implementation #10