clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.75k stars 466 forks source link

CORD-v2 accuracy much lower than the paper's results #254

Closed kevinmeetooa closed 1 year ago

kevinmeetooa commented 1 year ago

Hello,

I get surprisingly low results when running a test on cord-v2, in order to reproduce the paper's results.

I use the following command line python3 test.py --pretrained_model_name_or_path "naver-clova-ix/donut-base-finetuned-cord-v2" --dataset "naver-clova-ix/cord-v2" --split "test" and I get the following results:

Total number of samples: 100, Tree Edit Distance (TED) based accuracy score: 0.17636126902335467, F1 accuracy score: 0.1259655377302436, far from the expected 90% TED and 84% F1 score.

I haven't changed anything in the code and I run my tests on a single V100 GPU. Have I missed something ?

Thanks in advance

kevinmeetooa commented 1 year ago

Looks like it was a requirements issue, I solved it by installing the test colab's requirements: https://colab.research.google.com/drive/1NMSqoIZ_l39wyRD7yVjw2FIuU2aglzJi?usp=sharing#scrollTo=hsPb55wLT0ci