InternViT-300M-448px - Githubissues

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

https://internvl.readthedocs.io/en/latest/

MIT License

4.47k stars 344 forks source link

InternViT-300M-448px #253

Open phellonchen opened 1 month ago

phellonchen commented 1 month ago

Using llava to finetune, the result to wrose than siglip, this is unexpected, what's more, it actually can not get any Chinese OCR ability even with Chinese textvqa data. Why.

gd2016229035 commented 1 month ago

In my experiment (the same data as llava 1.5、the same dynamic image cut method as InternVL、but a different LLM), TextVQA and MME surpassed Siglip, yet underperformed on GQA, MMBench CN, and MMStart