clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.75k stars 466 forks source link

Texts in "swenglish"? #247

Closed henkish closed 1 year ago

henkish commented 1 year ago

Hello!

I have trained a model for parsing invoices (trained on private data set). The model understands the structure of the documents very well!

But I have an issue with texts. It seems model detect correct text (in the document structure), but changes some words to some combination of Swedish-english or Swedish-german versions of the words. Or made up Swedish words.

The problem with the invoices is that texts also can contain product abbreviations, technical terms, etc. - and not always regular Swedish sentences.

Is there some way to get more accurate texts?

Thanks in advance Henrik

henkish commented 1 year ago

Training on additional documents, and more epochs seemed to improve results - but we still get some strange texts sometimes.

Mohtadrao commented 8 months ago

I get strange result all the time. Kindly tell me what to do? @henkish