PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.44k stars 7.66k forks source link

长文本行识别,偶尔会出现单个字符被识别成2个相同字符的错误,比如,只有一个"一",识别成2个了,怎样改善? #12506

Closed nissansz closed 3 months ago

nissansz commented 3 months ago

问题描述 / Problem Description

运行环境 / Runtime Environment

复现代码 / Reproduction Code

完整报错 / Complete Error Message

可能解决方案 / Possible solutions

附件 / Appendix

长文本行识别,偶尔会出现单个字符被识别成2个相同字符的错误,比如,只有一个"一",识别成2个了,怎样改善?

UserWangZz commented 3 months ago

这是CTC解码原理存在的问题,可以更换算法或者加大训练轮数

nissansz commented 3 months ago

可以选哪个算法? 如果把每个字单独检测出来,最后合并识别单字结果,会不会好?

UserWangZz commented 3 months ago

可以了解CRNN,你说的基本上就是CTC的思想

nissansz commented 3 months ago

有哪个算法更好?可以改善这个问题?

nissansz commented 3 months ago

SVTR 可以改善?