dmMaze / BallonsTranslator

深度学习辅助漫画翻译工具, 支持一键机翻和简单的图像/文本编辑 | Yet another computer-aided comic/manga translation tool powered by deeplearning
GNU General Public License v3.0
2.74k stars 182 forks source link

[Feature Request] Spell checking for OCR result #591

Open heinrichI opened 1 month ago

heinrichI commented 1 month ago

Can you add dictionary check of OCR results? It helps to find many errors quickly. This function works well for subtitle recognition in Subtitle Edit. There it is done through via Libre Office dictionaries image

bropines commented 1 month ago

We discussed this once. QT doesn't seem to have a proper spell checking tool. In theory, there may be something in the vastness of libraries... But maybe DmMaze can tell you something

heinrichI commented 1 month ago

Why support from QT? Scan the OCR results with a dictionary, remove the names and show the remaining non-dictionary words in a separate panel.

bropines commented 1 month ago

Why support from QT? Scan the OCR results with a dictionary, remove the names and show the remaining non-dictionary words in a separate panel.

That is, you are suggesting that on the way from OCR we should also add a dictionary that will obviously not predict some words absolutely accurately?

heinrichI commented 1 month ago

That is, you are suggesting that on the way from OCR we should also add a dictionary that will obviously not predict some words absolutely accurately?

Yes, add a dictionary. Of course, it will not know all the words, but and OCR is not 100% accurate. This is just a simplification of text proofreading, so that you don't have to reread everything. If a word is not in the dictionary, then either it is an unknown word and we add it to the user dictionary, or it is a name and we add it to the temporary list of names, or it is an OCR error and you need to edit the text. The screenshot from the Subtitle Edit shows the buttons for adding to dictionaries, the editing window, possible correct options or autoreplacement.

vanderalex commented 1 month ago

I also think that dictionary will be very useful, please, think of implementing this function. In my translating, usually there are 3-5 errors on the page after OCR, and if comics has handwriting text... you understand :) And it's very hard to localize these mistakes sometimes...