For the CORD dataset, we use donut-base-finetuned-cord-v2 to evaluates the test set, but the result is "ted_accuracy": 0.9050784595020707, "f1_accuracy": 0.8300857365549493, which is lower than 91.6/93.5. For the Ticket dataset, the vocabulary of the published model lacks many Chinese characters, which is obviously impossible to achieve the results in the paper.
hi, I encountered the same problem. I finetune the Ticket dataset using the same hyper-parameters provided by the code and get ted_accuracy=0.9454, f1_accuracy=0.8686. Do you solve the problem?
For the CORD dataset, we use donut-base-finetuned-cord-v2 to evaluates the test set, but the result is "ted_accuracy": 0.9050784595020707, "f1_accuracy": 0.8300857365549493, which is lower than 91.6/93.5. For the Ticket dataset, the vocabulary of the published model lacks many Chinese characters, which is obviously impossible to achieve the results in the paper.