Open Why0912 opened 2 weeks ago
For another vision tower or projector, you can import what you like. Pay attention to multimodal_encoder and multimodal_projector. You need to add the code of class and modify the build
function.
For combining multiple vision features, you also need to modify the architecture of Bunny (something like vision_tower_list) and encode_image
function and etc.
Generally, you need to pre-train and fine-tune by yourself. Under some circumstances, you may start from our released weights.
感谢回复,另外问一下pre_train大概需要怎样的算力资源?
https://github.com/BAAI-DCAI/Bunny/issues/90
We always use 8*A100.
请问是否支持修改视觉模块或融合多个主干的视觉表征? 如果进行修改或融合,是否需要重新进行pre_train来获得相应的projector权重? 或是如何对projector进行修改?