clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.52k stars 443 forks source link

Prediction and Answer differ by dataset-specific tag #282

Closed ftkeys closed 5 months ago

ftkeys commented 5 months ago

Describe the bug

I am trying to finetune the pre-trained donut models using a small part of the SROIE dataset. The task is document parsing, therefore my ground truth data has the following form: {"file_name": "414.jpg", "ground_truth": "{\"gt_parse\": {\"company\": \"KEDAI UHAT DAN RUNCIT CHONG HWA\", \"date\": \"OCT 3, 2016\", \"address\": \"3, JALAN PERDANA 5, TAMAN INDAH PERDANA, KEPONG, 52100 KL.\", \"total\": \"RM33.90\"}}"}

I am cloning from https://github.com/clovaai/donut and installing using pip install . Then I'm running the provided train.py script using a copy of the train_cord.yaml config file where I only adjusted the dataset entry.

The validation debug output (Epoch 29) has the following form: Prediction: <s_sroie><s_company>MRIES SDN BHD</s_company><s_date>05 JUN 18 APR 2018</s_date><s_address>USJ SLN TSJ2,TMN SUBANG JASA, 40000 SHAH ALAM, SEL.</s_address><s_total>14.10</s_total>

Answer: <s_company>TOKYO KITCHEN (CITTA MALL) TOKYO KITCHEN SDN BHD</s_company><s_date>23-04-2017</s_date><s_address>G-26, GRD FLOOR,CITTA MALL, NO 1, JALAN PJU, 1A/4B, JLN PJU 1,ARA DAMANSARA, 47301 PETALING JAYA, SELANGOR.</s_address><s_total>113.80</s_total>

I am wondering why in the prediction there is always this extra <s_sroie> tag (where sroie is the dataset folder name), while in the answer (ground truth) it is not present. Is this intended? If not, is there any way to fix that?

Anyhow, the training results I achieve after 30 epochs are really bad - the model basically predicts the same output for every input and it is completely wrong. So I wondered if the extra tag could be the problem.

I appreciate any help or information! Thanks in advance.

Environment info

Google Colab Python 3.10.12 Transfomers 4.35.2 pytorch-lightning 2.1.3 timm 0.9.12

ftkeys commented 5 months ago

Downgrading to the following dependency versions resolved the problem:

transformers==4.25.1
pytorch-lightning==1.8.5 # 1.6.4
timm==0.5.4