THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.08k stars 415 forks source link

finetune image size #82

Open hardlipay opened 1 year ago

hardlipay commented 1 year ago

The official mentioned that blip uses a resolution of 224, which may not be good for image detail understanding, can finetune training modify the image size? If not, and need to train from zero, then is it possible to train only visual model? Or do I need to do a complete alignment training? 官方提到blip使用了224的分辨率,这对图片细节理解可能不好,finetune训练可以修改图片大小吗?如果不能,需要从零训练,那么是可以只训练visual model吗?还是需要做完整的对齐训练?

freelancerllm commented 1 year ago

验证是可以修改图片大小,目前我设置为540,但是效果还没评估?

hardlipay commented 1 year ago

验证是可以修改图片大小,目前我设置为540,但是效果还没评估?

试了finetune训练,直接在源码里改了size,tensor size不匹配,传递不了,应该不可以,要改可能要从头做对齐训练

zgjiangtoby commented 1 year ago

验证是可以修改图片大小,目前我设置为540,但是效果还没评估?

试了finetune训练,直接在源码里改了size,tensor size不匹配,传递不了,应该不可以,要改可能要从头做对齐训练

所以需要等pre-training的代码release吗?

cdqncn commented 11 months ago

怎么才能将分辨率提高到384呢

1049451037 commented 11 months ago

可以参考这里:https://github.com/THUDM/VisualGLM-6B/issues/296