clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.52k stars 443 forks source link

Integrate a customized internal OCR engine to Donut #285

Open Altimis opened 5 months ago

Altimis commented 5 months ago

Hello guys. Thank you so much for this brilliant Model. I'm aware that Donut is an OCR-free model which does not rely on an OCR input. When I performed some tests (fine-tuning the model), I realized that the internal OCR-engine performance is not as good as Google Cloud Vision OCR. Is is possible to change the OCR engine by this one ? Thanks you !

felixvor commented 5 months ago

Donut is not made to compete with OCR engines, it is pre-trained on generating OCR to give it a general understanding about characters and language that can be leveraged in fine tuning tasks, like extracting a specific information from an input image. If you want good OCR, I would recommend sticking to tesseract or cloud solutions like the one you suggested.

GiovanniNova commented 2 days ago

Donut is not made to compete with OCR engines, it is pre-trained on generating OCR to give it a general understanding about characters and language that can be leveraged in fine tuning tasks, like extracting a specific information from an input image. If you want good OCR, I would recommend sticking to tesseract or cloud solutions like the one you suggested.

I believe he would like to know how to feed text to Donut, instead of images.

I myself have trained a 99% accuracy OCR and I'd like to feed my text to Donut so I can get structured and nested information back, do you believe it would be possible?