Questions about training vision encoder

NVlabs / EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

https://arxiv.org/pdf/2408.15998

Apache License 2.0

543 stars 45 forks source link

Open baichuanzhou opened 2 weeks ago

baichuanzhou commented 2 weeks ago

Hi, guys.

Eagle is a really nice work, and models work very well in practice. I just have some questions about vision encoder's training details.

Do you guys train all vision encoders or keep some fixed? Why decorate the forward function of CLIP with no_grad here?