- - Githubissues

chineseocr / trocr-chinese

transformers ocr for chinese

355 stars 55 forks source link

- #3

Closed archwolf118 closed 2 years ago

archwolf118 commented 2 years ago

wenlihaoyu commented 2 years ago

为啥不合理？只要你的文本不是太长（比如宽度超过2048）。padding不就浪费了，为啥不选用其他图像侧的backbone呢？

archwolf118 commented 2 years ago

resize到固定尺寸必然会导致文字信息产生变形。

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: lywen @.> 发送时间: 2022年4月9日 21:19 收件人: chineseocr/trocr-chinese @.> 抄送: archwolf118 @.>, Author @.> 主题: Re: [chineseocr/trocr-chinese] 现在trocr最大的问题就是这个384*384的预处理 (Issue #3)

为啥不合理？只要你的文本不是太长（比如宽度超过2048）。padding不就浪费了，为啥不选用其他图像侧的backbone呢？

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

746891300 commented 2 years ago

我理解应该是大量数据预训练的前提，可以把变形的文字也看成一种字体，学习过了就可以准确预测

wenlihaoyu commented 2 years ago

resize到固定尺寸必然会导致文字信息产生变形。发自我的iPhone … ------------------ 原始邮件 ------------------ 发件人: lywen @.> 发送时间: 2022年4月9日 21:19 收件人: chineseocr/trocr-chinese @.> 抄送: archwolf118 @.>, Author @.> 主题: Re: [chineseocr/trocr-chinese] 现在trocr最大的问题就是这个384*384的预处理 (Issue #3) 为啥不合理？只要你的文本不是太长（比如宽度超过2048）。padding不就浪费了，为啥不选用其他图像侧的backbone呢？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

形变有啥关系呢，预训练就让模型适应了这样的变化，相当于模型进行了空间映射。

wenlihaoyu commented 2 years ago

任何算法都不是全能的。如果觉得此方法不好，可以选择其他算法，不要因为此项目让自己不愉快。

wenlihaoyu commented 2 years ago

超长的文本，其实也是可以识别出来的，roberta是支持最大510个字符（除去s,/s），只是seq2seq方式会超慢而已（如果自己场景全是超过2048像素，ctc方式也需要很大的显卡才能训练得很好）。这里探索的是用一些transformer的方式去解决比如弧形文字、不规则文字、多行文字的端识别方法。