Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
MIT License
1.82k stars 128 forks source link

指标相差太大是什么原因呢? #129

Closed luyao-cv closed 1 month ago

luyao-cv commented 2 months ago

image

你好,我复现评测指标,用的MultimodalOCR工具,测出来的点(第二行)跟你们论文的指标相差很大。组网代码用的你们的chat_internl

STVQAacc分数 POIEacc分数 FUNSDacc分数 SROIEacc分数
  69.9 42.9 70.3
64.2 47.8 40.6 63.4
mxin262 commented 2 months ago

Hi~, we upload the evaluation file in the MultimodalOCR

luyao-cv commented 2 months ago

好的, 我试下,另外,这些指标仍然对不齐是为什么呢?第一行是您论文的数据,第二行是我用你们代码测出来的。

docvqa-val chartqa infovqa-val textvqa ocrbench
test: 87.4 76.5 test: 60.1 75.7 802
86.9 74.3 57.1 73.5 800
luyao-cv commented 2 months ago

Hi~, we upload the evaluation file in the MultimodalOCR

您好,使用了官方提供的权重, 用的 MultimodalOCR 测试,python ./scripts/MiniMonkey.py --image_folder ./OCRBench_Images/data --OCRBench_file ./OCRBench/FullTest.json --num_workers 1 测出来的FUNSD: 0.419953596287703,与论文指标42.9有差异,请问是用的什么环境呢。 cuda 11.8 python 3.10 transformers 4.39.3 flash-atten: 3.6.3 (编译的)

mxin262 commented 2 months ago

We use flash-attn 2.5.8 pytorch 2.2.2 transformers 4.40.1 cuda 11.8 python 3.10.14

The FUNSD can achieve 0.43155452436194897.

For VLMEvalKit, you can replace the VLMEvalKit/vlmeval/vlm/internvl_chat.py with this file. It achieves 806 on OCRBench, 76.0 on TextVQA_VAL, and 60.2 on InfoVQA_VAL

For the docvqa-test and chartqa, we evaluate them following this step

luyao-cv commented 2 months ago

关于docvqa-test 和 chartqa,我尝试了在internvl2代码库的eval代码里加上dynamic_preprocess2处理,使用https://www.wisemodel.cn/models/HUST-VLRLab/Mini-Monkey/file 权重,最高docvqa-test交官网也只有86.96(论文里87.4), chartqa最高只有76.2(论文里76.5),希望您可以公布下可以准确复现这两个精度的代码和参数,谢谢

mxin262 commented 2 months ago

Hi~, we update the evaluation file

LiWentomng commented 2 months ago

@luyao-cv hello, I re-trained mini-monkey followining the provided details, and use VLMEvalKit to perform evaluation. The results are listed below:

MME SEED-IMG Textvqa-val OCR-Bench
paper 1881.9 71.3 75.7 802
re-train 1842 71.1 71.9 632

paper denotes the reported results from paper, re-trainis my testing results based on retraining model.

Can you provide some suggestions? Thanks.

mxin262 commented 2 months ago

Hi~, can you provide the training log? @LiWentomng

LiWentomng commented 2 months ago

@mxin262 this is my training log. thanks for your time. training_log.txt

mxin262 commented 2 months ago

@LiWentomng Hello, you might need to adjust the learning rate to 4e-9. I was able to reproduce the results using this learning rate on 4 GPUs.

luyao-cv commented 2 months ago

非常感谢提供的帮助,我这边再试试。