clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.52k stars 443 forks source link

Multipage docs #239

Open abaranovskis-redsamurai opened 10 months ago

abaranovskis-redsamurai commented 10 months ago

Hello,

Examples for Donut are based on single-page docs (invoices, receipts, etc.). How well would it work with multipage docs? For instance, if the number of invoice items is large and the rest of the invoice goes to the second page. Would extract data from the second page work out of the box?

Thanks.

SNavgale commented 7 months ago

Hi Andrej, Have you got answer to your above question? By the way your youTube video on Donut is really good.

Thanks, Sanjay

abaranovskis-redsamurai commented 7 months ago

hey Sanjay. Nope, there was no answer. What I did - converted two pages PDF into a single image. This way it worked.

Thanks for your feedback about Donut related video :)

Andrej

SNavgale commented 7 months ago

Thanks for quick reply. Do you mean create one long image?

abaranovskis-redsamurai commented 7 months ago

yes, correct.

kaushal2012 commented 3 months ago

@abaranovskis-redsamurai ....if suppose i have a PDF consisting 10 pages from which i need to parse data in continuation to maintain the hierarchy of headings and its points/sub points that continue on the next pages....what about the config file parameters changes like max_length, input_size etc. ?

Thanks in advance!!

SNavgale commented 3 months ago

I have trained successfully with up to 10 pages with default value of max_length. You can calculate max_length using number of keys you have. Size is based on your document size so change accordingly.

kaushal2012 commented 2 months ago

I have trained successfully with up to 10 pages with default value of max_length. You can calculate max_length using number of keys you have. Size is based on your document size so change accordingly.

do you mean 10 merged into a single image? how did you annotate it? because i tried annotating it with label studio and it throws unresponsive error because the image is too long, my ML backend is not working properly because of it.