OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.39k stars 869 forks source link

💡 [REQUEST] - <minicpm-v 2.6版本的vlmevalkit支持吗> #395

Closed Alxemade closed 2 months ago

Alxemade commented 2 months ago

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

minicpm-v 2.6版本的vlmevalkit支持吗,自己尝试了复现了但是达不到官网水平,不知道是不是一些数据集的Prompt没写好。

基本示例 | Basic Example

暂无

缺陷 | Drawbacks

暂无

未解决问题 | Unresolved questions

No response

Cuiunbo commented 2 months ago

您好,非常高兴您关注到我们的模型, 我们使用的是vlmevalkit进行的评测, 目前已向官方提交了 PR, 有更多的信息, 我们将及时同步~

Cuiunbo commented 2 months ago

https://github.com/open-compass/VLMEvalKit/pull/368

Alxemade commented 2 months ago

谢谢您快速回复。

Cuiunbo commented 2 months ago

https://github.com/open-compass/VLMEvalKit 已合入主仓库, 但MME的prompt我发现还略有差别, 如需推理MME请稍等后面的PR

Alxemade commented 2 months ago

好的,我尝试了一下OCRBench,得分为:

{
    "Text Recognition": 239,
    "Scene Text-centric VQA": 148,
    "Doc-oriented VQA": 143,
    "Key Information Extraction": 126,
    "Handwritten Mathematical Expression Recognition": 30,
    "Final Score": 686,
    "Final Score Norm": 68.6
}

MME:

"perception","reasoning","OCR","artwork","celebrity","code_reasoning","color","commonsense_reasoning","count","existence","landmark","numerical_calculation","position","posters","scene","text_translation"
"1643.3386354541817","538.2142857142858","170.0","146.5","154.41176470588238","125.0","173.33333333333334","145.71428571428572","160.0","195.0","176.0","90.0","138.33333333333331","175.51020408163265","154.25","177.5"

HallusionBench:

"split","aAcc","fAcc","qAcc"
"Overall","47.3186119873817","20.23121387283237","23.296703296703296"

好像还是有点差距。

zhudongwork commented 2 months ago

https://github.com/open-compass/VLMEvalKit 已合入主仓库, 但MME的prompt我发现还略有差别, 如需推理MME请稍等后面的PR VLMEvalKit中的minicpm-v 2.6测评MME时没有使用cot,但是官网报告的MME分数却标明使用了cot,所以到底用没用到cot呢?我手动修改vlmevalkit,将MME开启cot,但是分数却比没开cot低了很多。 "perception","reasoning","OCR","artwork","celebrity","code_reasoning","color","commonsense_reasoning","count","existence","landmark","numerical_calculation","position","posters","scene","text_translation" "1415.736294517807","629.2857142857142","155.0","116.75","110.29411764705881","167.5","155.0","139.28571428571428","126.66666666666666","185.0","126.5","137.5","128.33333333333334","155.4421768707483","156.75","185.0"