Closed luyao-cv closed 1 month ago
Hi~, we upload the evaluation file in the MultimodalOCR
好的, 我试下,另外,这些指标仍然对不齐是为什么呢?第一行是您论文的数据,第二行是我用你们代码测出来的。
docvqa-val | chartqa | infovqa-val | textvqa | ocrbench |
---|---|---|---|---|
test: 87.4 | 76.5 | test: 60.1 | 75.7 | 802 |
86.9 | 74.3 | 57.1 | 73.5 | 800 |
Hi~, we upload the evaluation file in the MultimodalOCR
您好,使用了官方提供的权重, 用的 MultimodalOCR 测试,python ./scripts/MiniMonkey.py --image_folder ./OCRBench_Images/data --OCRBench_file ./OCRBench/FullTest.json --num_workers 1 测出来的FUNSD: 0.419953596287703,与论文指标42.9有差异,请问是用的什么环境呢。 cuda 11.8 python 3.10 transformers 4.39.3 flash-atten: 3.6.3 (编译的)
We use flash-attn 2.5.8 pytorch 2.2.2 transformers 4.40.1 cuda 11.8 python 3.10.14
The FUNSD can achieve 0.43155452436194897.
For VLMEvalKit, you can replace the VLMEvalKit/vlmeval/vlm/internvl_chat.py with this file. It achieves 806 on OCRBench, 76.0 on TextVQA_VAL, and 60.2 on InfoVQA_VAL
For the docvqa-test and chartqa, we evaluate them following this step
关于docvqa-test 和 chartqa,我尝试了在internvl2代码库的eval代码里加上dynamic_preprocess2处理,使用https://www.wisemodel.cn/models/HUST-VLRLab/Mini-Monkey/file 权重,最高docvqa-test交官网也只有86.96(论文里87.4), chartqa最高只有76.2(论文里76.5),希望您可以公布下可以准确复现这两个精度的代码和参数,谢谢
@luyao-cv hello, I re-trained mini-monkey followining the provided details, and use VLMEvalKit to perform evaluation. The results are listed below:
MME | SEED-IMG | Textvqa-val | OCR-Bench | |
---|---|---|---|---|
paper | 1881.9 | 71.3 | 75.7 | 802 |
re-train | 1842 | 71.1 | 71.9 | 632 |
paper
denotes the reported results from paper, re-train
is my testing results based on retraining model.
Can you provide some suggestions? Thanks.
Hi~, can you provide the training log? @LiWentomng
@mxin262 this is my training log. thanks for your time. training_log.txt
@LiWentomng Hello, you might need to adjust the learning rate to 4e-9. I was able to reproduce the results using this learning rate on 4 GPUs.
非常感谢提供的帮助,我这边再试试。
你好,我复现评测指标,用的MultimodalOCR工具,测出来的点(第二行)跟你们论文的指标相差很大。组网代码用的你们的chat_internl