修改或融合视觉模块

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Apache License 2.0

799 stars 61 forks source link

修改或融合视觉模块 #99

Open Why0912 opened 2 weeks ago

Why0912 commented 2 weeks ago

请问是否支持修改视觉模块或融合多个主干的视觉表征？如果进行修改或融合，是否需要重新进行pre_train来获得相应的projector权重？或是如何对projector进行修改？

Isaachhh commented 1 week ago

For another vision tower or projector, you can import what you like. Pay attention to multimodal_encoder and multimodal_projector. You need to add the code of class and modify the build function.

For combining multiple vision features, you also need to modify the architecture of Bunny (something like vision_tower_list) and encode_image function and etc.

Generally, you need to pre-train and fine-tune by yourself. Under some circumstances, you may start from our released weights.

Why0912 commented 3 days ago

感谢回复，另外问一下pre_train大概需要怎样的算力资源？

Isaachhh commented 3 days ago

https://github.com/BAAI-DCAI/Bunny/issues/90

We always use 8*A100.