clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.74k stars 466 forks source link

details is not ideal #258

Open chopin1998 opened 11 months ago

chopin1998 commented 11 months ago

2023-10-17_08-35

I prepared over 300 photos similar to this and label them, around 10 key information pairs from each photo, including names, gender, various timestamps, and all the individual check results.

The training results surprised me, as almost all the key information could be matched with the corresponding values.

However, I noticed that for some fine details in recognition, it was not as accurate as traditional OCR methods (such as PaddleOCR). For instance, with clear graphics, the names matched, but individual characters might have errors.

How can I further improve this issue? Thank you.

bugface commented 10 months ago

could be image resolution is not high enough? The base model trained with 1920x2580 (I might be wrong here but something like this), if your image resolution is too low, it might cause the problem.

chopin1998 commented 10 months ago

could be image resolution is not high enough? The base model trained with 1920x2580 (I might be wrong here but something like this), if your image resolution is too low, it might cause the problem.

processor resolution is 2560 by 1920, but my cuda device only 22G VRAM, so i scale image to 1440, 1080

huangding1535 commented 7 months ago

请问您是用什么标注平台进行数据标注呢

chopin1998 commented 7 months ago

请问您是用什么标注平台进行数据标注呢

我自己搓了一个半自动标注工具, 人肉标注的 https://github.com/chopin1998/label_it

huangding1535 commented 7 months ago

我自己搓了一个半自动标注工具, 人肉标注的 https://github.com/chopin1998/label_it

请问,你做的任务数据训练格式是怎么样的,还有模型配置的参数这些。谢谢~