clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.52k stars 443 forks source link

Random prediction and wrong prediction in repeated characters #270

Open Asha-12502 opened 8 months ago

Asha-12502 commented 8 months ago

Hello,

I have trained a donut base model on our custom dataset, which consists of a total of 12,480 images. I then fine-tuned this base model with default parameters.

During the analysis of predictions, I observed certain patterns in the JSON output. Specifically, when similar keys appear almost simultaneously, the model tends to make the following types of errors:

It predicts extra characters (e.g., "Paneer cheese paratha with butter" is predicted as "Paneer Paneer cheese paratha with butter"). It misses some characters (e.g., "199.00" is predicted as "19.00"). It predicts incorrect characters (e.g., "119.00" is predicted as "159.00").

Additionally, I noticed that the model often predicts characters such as "5," "7," and "1," even though these characters are not present in the images.

Ground Truth:

{ "table": [ { "key": "Paneer paratha with butter", "value": "199.00" }, { "key": "Paneer cheese paratha with butter", "value": "119.00" } ] }

Prediction:

{ "table": [ { "key": "Paneer paratha with butter", "value": "19.00" }, { "key": "Paneer Paneer cheese paratha with butter", "value": "159.00" } ] }

In the below json, model misses in between characters, predicts something else other than ground truth or gives extra characters in prediction which are not there in image/json. The image is clean enough for a model to get proper predictions still it gets wrong predictions as mentioned above.

As per analysis, the model makes more mistakes in values(Numeric) than keys(Alphabetic), maybe the reason is data imbalancing.

Ground Truth:

{ "table": [ { "key": "Accessible Amount", "value": "9123.23" }, { "key": "Car parts due :", "value": "2,09,233.19" }, { "key": "Paint brushes :", "value": "200.00" } ] }

Predicted:

{ "table": [ { "key": "Accesible Amount", "value": "9123.33" }, { "key": "Car parts due :", "value": "9,1,233.19" }, { "key": "Paint brushes :", "value": "200.000" } ] }

In the JSON provided below, despite the clarity of the image, the model consistently exhibits several issues:

Missing Characters: The model frequently fails to recognize certain characters. Duplicate Keys: It tends to predict the same type of key multiple times, resulting in an extra key, such as "Oil fluid," which is a combination of two adjacent keys. Missing Colon (:) at the End of Keys: The model omits the colon character at the end of keys. Missing Plus Sign (+) in Values: It also overlooks the plus sign in values.

Ground Truth :

{ "table": [ { "key": "Delivery charges :", "value": "(+)470.00" }, { "key": "Oil charge:", "value": "3,120.00" }, { "key": "Washer fluid :", "value": "3,120.00" } ] }

Predicted:

{ "table": [ { "key": "Delivery charges", "value": "( )470.00" }, { "key": "Oil charge:", "value": "3,120.00" }, { "key": "Oil fluid :", "value": "157.00" }, { "key": "Washer fluid :", "value": "3,120.00" } ] }

In the below json, I have found the same pattern that sometimes model predict a character only one time even after that character there two times in the image. like; (‘@ @’, ‘: :’) then the model will predict it only once. Also predicts the same keys twice.

Ground Truth:

{ "table": [ { "key": "Transport charges::", "value": "144.00" }, { "key": "Freight charges", "value": "" }, { "key": "Washer fluid @ @ 18 %", "value": "3,120.00" } ] }

Prediction:

{ "table": [ { "key": "Transport charges:", "value": "144.00" }, { "key": "Freight charges:", "value": "" }, { "key": "Freight charges:", "value": "" }, { "key": "Freight charges:", "value": "" }, { "key": "Washer fluid @ 18 %", "value": "3,120.00" } ] }

Chzhiyuan512 commented 6 months ago

I also have the same problem, have you solved it?

Asha-12502 commented 6 months ago

I didn't get any solution yet. @Chzhiyuan512

Mohtadrao commented 5 months ago

@Asha-12502 / @Chzhiyuan512 still a big problem that need to be resolved. Have you find its solution? As I am facing the same problem. image

kaushal2012 commented 3 months ago

@Asha-12502 Asha-12502 can you share your config file? ''' My team is also trying to train donut on custom dataset but we are not getting much accuracy due to complex ground truth cause our dataset consist scanned documents and we have to extract all the text data from the whole PDFs.