BOOKXNOTE / BookxNote-pro

BookxNote Pro问题与需求反馈
http://www.bookxnote.com
121 stars 8 forks source link

翻译识别文本换行的问题 #168

Open Link-Li opened 2 years ago

Link-Li commented 2 years ago

选中的pdf文本,在翻译的时候,经常因为识别出来的文本空格错误导致翻译效果很差。 选中的文本如果跨行了,识别出来的文本,跨行的两个单词之间并没有空格,直接变成了一个词。 比如: there has been growing interest in parameter-efficient methods to apply these models to downstream tasks. 识别成了: there has been growing interest in parameter-efficient methods to applythese models to downstream tasks. 正常的应该是: there has been growing interest in parameter-efficient methods to apply these models to downstream tasks.

有的跨行的文本是一个单词拆分的,然后识别出来的结果没有把跨行符去掉: 比如: As pre-trained language models have got- ten larger 识别成了:As pre-trained language models have got-ten larger 应该是:As pre-trained language models have gotten larger

麻烦作者看到可以修改一下这个文本识别,毕竟这个对最后的翻译结果影响非常的巨大。谢谢啦!