BAAI-DCAI / Bunny

A family of lightweight multimodal models.
Apache License 2.0
799 stars 61 forks source link

修改或融合视觉模块 #99

Open Why0912 opened 2 weeks ago

Why0912 commented 2 weeks ago

请问是否支持修改视觉模块或融合多个主干的视觉表征? 如果进行修改或融合,是否需要重新进行pre_train来获得相应的projector权重? 或是如何对projector进行修改?

Isaachhh commented 1 week ago

For another vision tower or projector, you can import what you like. Pay attention to multimodal_encoder and multimodal_projector. You need to add the code of class and modify the build function.

For combining multiple vision features, you also need to modify the architecture of Bunny (something like vision_tower_list) and encode_image function and etc.

Generally, you need to pre-train and fine-tune by yourself. Under some circumstances, you may start from our released weights.

Why0912 commented 3 days ago

感谢回复,另外问一下pre_train大概需要怎样的算力资源?

Isaachhh commented 3 days ago

https://github.com/BAAI-DCAI/Bunny/issues/90

We always use 8*A100.