CORD-v2 accuracy much lower than the paper's results

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

MIT License

5.75k stars 466 forks source link

Hello,

I get surprisingly low results when running a test on cord-v2, in order to reproduce the paper's results.

I use the following command line python3 test.py --pretrained_model_name_or_path "naver-clova-ix/donut-base-finetuned-cord-v2" --dataset "naver-clova-ix/cord-v2" --split "test" and I get the following results:

Total number of samples: 100, Tree Edit Distance (TED) based accuracy score: 0.17636126902335467, F1 accuracy score: 0.1259655377302436, far from the expected 90% TED and 84% F1 score.

I haven't changed anything in the code and I run my tests on a single V100 GPU. Have I missed something ?

Thanks in advance

clovaai / donut

CORD-v2 accuracy much lower than the paper's results #254