指标相差太大是什么原因呢？

Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

MIT License

1.82k stars 128 forks source link

指标相差太大是什么原因呢？ #129

Closed luyao-cv closed 1 month ago

luyao-cv commented 2 months ago

你好，我复现评测指标，用的MultimodalOCR工具，测出来的点（第二行）跟你们论文的指标相差很大。组网代码用的你们的chat_internl

STVQAacc分数	POIEacc分数	FUNSDacc分数	SROIEacc分数
	69.9	42.9	70.3
64.2	47.8	40.6	63.4

mxin262 commented 2 months ago

Hi~, we upload the evaluation file in the MultimodalOCR

luyao-cv commented 2 months ago

好的，我试下，另外，这些指标仍然对不齐是为什么呢？第一行是您论文的数据，第二行是我用你们代码测出来的。

docvqa-val	chartqa	infovqa-val	textvqa	ocrbench
test: 87.4	76.5	test: 60.1	75.7	802
86.9	74.3	57.1	73.5	800

luyao-cv commented 2 months ago

Hi~, we upload the evaluation file in the MultimodalOCR

您好，使用了官方提供的权重，用的 MultimodalOCR 测试，python ./scripts/MiniMonkey.py --image_folder ./OCRBench_Images/data --OCRBench_file ./OCRBench/FullTest.json --num_workers 1 测出来的FUNSD: 0.419953596287703，与论文指标42.9有差异，请问是用的什么环境呢。 cuda 11.8 python 3.10 transformers 4.39.3 flash-atten: 3.6.3 (编译的)

mxin262 commented 2 months ago

We use flash-attn 2.5.8 pytorch 2.2.2 transformers 4.40.1 cuda 11.8 python 3.10.14

The FUNSD can achieve 0.43155452436194897.

For VLMEvalKit, you can replace the VLMEvalKit/vlmeval/vlm/internvl_chat.py with this file. It achieves 806 on OCRBench, 76.0 on TextVQA_VAL, and 60.2 on InfoVQA_VAL

For the docvqa-test and chartqa, we evaluate them following this step

luyao-cv commented 2 months ago

关于docvqa-test 和 chartqa，我尝试了在internvl2代码库的eval代码里加上dynamic_preprocess2处理，使用https://www.wisemodel.cn/models/HUST-VLRLab/Mini-Monkey/file 权重，最高docvqa-test交官网也只有86.96(论文里87.4)， chartqa最高只有76.2(论文里76.5)，希望您可以公布下可以准确复现这两个精度的代码和参数，谢谢

mxin262 commented 2 months ago

Hi~, we update the evaluation file

LiWentomng commented 2 months ago

@luyao-cv hello, I re-trained mini-monkey followining the provided details, and use VLMEvalKit to perform evaluation. The results are listed below:

	MME	SEED-IMG	Textvqa-val	OCR-Bench
paper	1881.9	71.3	75.7	802
re-train	1842	71.1	71.9	632

paper denotes the reported results from paper, re-trainis my testing results based on retraining model.

Can you provide some suggestions? Thanks.

mxin262 commented 2 months ago

Hi~, can you provide the training log? @LiWentomng

LiWentomng commented 2 months ago

@mxin262 this is my training log. thanks for your time. training_log.txt

mxin262 commented 2 months ago

@LiWentomng Hello, you might need to adjust the learning rate to 4e-9. I was able to reproduce the results using this learning rate on 4 GPUs.

luyao-cv commented 2 months ago

非常感谢提供的帮助，我这边再试试。