NVlabs / EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
https://arxiv.org/pdf/2408.15998
Apache License 2.0
543 stars 45 forks source link

Questions about training vision encoder #24

Open baichuanzhou opened 2 weeks ago

baichuanzhou commented 2 weeks ago

Hi, guys.

Eagle is a really nice work, and models work very well in practice. I just have some questions about vision encoder's training details.

Do you guys train all vision encoders or keep some fixed? Why decorate the forward function of CLIP with no_grad here?