OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
11.57k stars 815 forks source link

Regarding Input Resolution/关于输入分辨率 #353

Closed yjhdhr closed 1 month ago

yjhdhr commented 1 month ago

May I ask if the visual module is integrated from siglip-so400m-14-980-flash-attn2-navit? The maximum resolution supported by the original Siglip is 980, why does Minicpmv2.5 only support a single block of 448?

请问视觉模块是从siglip-so400m-14-980-flash-attn2-navit集成而来吗? 原版siglip支持的最大分辨率是980,为什么minicpmv2.5只支持到单块448?

yjhdhr commented 1 month ago

What are the negative effects of using up-to 980 for training/infering? 使用单块最大980的训练/推理有什么负面影响?

qyc-98 commented 1 month ago

跟我们预训练不一致,所以会有一些out of domain现象,可能效果不如直接用448好