clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.75k stars 466 forks source link

I had conducted the experiment outlined in your paper and have come across results that do not match with the ones you reported. #142

Open liuchaohu opened 1 year ago

liuchaohu commented 1 year ago

For the CORD dataset, we use donut-base-finetuned-cord-v2 to evaluates the test set, but the result is "ted_accuracy": 0.9050784595020707, "f1_accuracy": 0.8300857365549493, which is lower than 91.6/93.5. For the Ticket dataset, the vocabulary of the published model lacks many Chinese characters, which is obviously impossible to achieve the results in the paper.

SleepEarlyLiveLong commented 1 year ago

hi, I encountered the same problem. I finetune the Ticket dataset using the same hyper-parameters provided by the code and get ted_accuracy=0.9454, f1_accuracy=0.8686. Do you solve the problem?