PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Apache License 2.0
301 stars 117 forks source link

PaddleMix 怎么更改 多模态模型的Vision encoder呢? #583

Open ApolloRay opened 3 months ago

ApolloRay commented 3 months ago

而且权重写在一个pdparams里,无法单独替换视觉编码器或者LM decoder部分的权重。另外貌似生态支持的vision encoder不太多?

jerrywgz commented 2 months ago

感谢反馈,目前多模态模型支持的vision encoder数量比较少,另外替换不同vision encoder和LLM部分权重现在还只有BLIP2支持 会持续加强