Closed jongwoopark7978 closed 3 months ago
Yes, it was freezed.
Thank you for your quick answer.
Hi @ZhangYuanhan-AI , you said that visual encoder is frozen, but model.safetensors.index.json
in LLaVA-NeXT-Video-DPO(7B) contains the vision_tower
keys like following:
So I'm a little confused, what parts are trainable when it comes to training LLaVA-NeXT-Video-DPO(7B)?
Would you mind share the hyper-parameters used to train LLaVA-NeXT-Video-DPO(7B)?
Thanks a lot!
The ckpt is the same as the original CLIP weights
Hi team,
I am currently using LLaVA-NeXT-Video-DPO (7B) and I want to confirm if it uses the pre-trained CLIP ViT-L/14. During training, do you freeze the visual encoder in the same way as in llava1.5? I ask because I hope to use the CLIP text encoder to measure the similarity btw visual and text tokens.