Closed Alxemade closed 2 months ago
您好,非常高兴您关注到我们的模型, 我们使用的是vlmevalkit进行的评测, 目前已向官方提交了 PR, 有更多的信息, 我们将及时同步~
谢谢您快速回复。
https://github.com/open-compass/VLMEvalKit 已合入主仓库, 但MME的prompt我发现还略有差别, 如需推理MME请稍等后面的PR
好的,我尝试了一下OCRBench,得分为:
{
"Text Recognition": 239,
"Scene Text-centric VQA": 148,
"Doc-oriented VQA": 143,
"Key Information Extraction": 126,
"Handwritten Mathematical Expression Recognition": 30,
"Final Score": 686,
"Final Score Norm": 68.6
}
MME:
"perception","reasoning","OCR","artwork","celebrity","code_reasoning","color","commonsense_reasoning","count","existence","landmark","numerical_calculation","position","posters","scene","text_translation"
"1643.3386354541817","538.2142857142858","170.0","146.5","154.41176470588238","125.0","173.33333333333334","145.71428571428572","160.0","195.0","176.0","90.0","138.33333333333331","175.51020408163265","154.25","177.5"
HallusionBench:
"split","aAcc","fAcc","qAcc"
"Overall","47.3186119873817","20.23121387283237","23.296703296703296"
好像还是有点差距。
https://github.com/open-compass/VLMEvalKit 已合入主仓库, 但MME的prompt我发现还略有差别, 如需推理MME请稍等后面的PR VLMEvalKit中的minicpm-v 2.6测评MME时没有使用cot,但是官网报告的MME分数却标明使用了cot,所以到底用没用到cot呢?我手动修改vlmevalkit,将MME开启cot,但是分数却比没开cot低了很多。
"perception","reasoning","OCR","artwork","celebrity","code_reasoning","color","commonsense_reasoning","count","existence","landmark","numerical_calculation","position","posters","scene","text_translation" "1415.736294517807","629.2857142857142","155.0","116.75","110.29411764705881","167.5","155.0","139.28571428571428","126.66666666666666","185.0","126.5","137.5","128.33333333333334","155.4421768707483","156.75","185.0"
起始日期 | Start Date
No response
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
minicpm-v 2.6版本的vlmevalkit支持吗,自己尝试了复现了但是达不到官网水平,不知道是不是一些数据集的Prompt没写好。
基本示例 | Basic Example
暂无
缺陷 | Drawbacks
暂无
未解决问题 | Unresolved questions
No response