key information extraction with DonUT on hand-written documents?

DiTo97 commented 1 year ago

Hi everyone,

Has anyone tried fine-tuning DonUT for key information extraction on a corpus with documents half-digital and half-handwritten? Specifically, I am wondering if anyone has any evidence on how it performs on handwritten text, given that all the suggestions on generating a synthetic dataset with SynthDoG for pre-training point to selecting appropriate fonts of the digital text.

I have a private corpus of invoices similar to CORD in nature (with slightly more variability in shape, size and format), but some of them may have sections of handwritten text from time to time in addition to or in place of digital text.

Toon-nooT commented 1 year ago

I can confirm that it also picks up handwritten information.

DiTo97 commented 1 year ago

I can confirm that it also picks up handwritten information.

Thank you @Toon-nooT,

Could you share one example document with handwritten text that you tested DonUT on?

No stress if it's not possible

clovaai / donut

key information extraction with DonUT on hand-written documents? #188