VoxelCubes / PanelCleaner

An AI-powered tool to clean manga panels.
GNU General Public License v3.0
224 stars 17 forks source link

Additional text added outside of text inside of images. #24

Closed EagleEye17 closed 11 months ago

EagleEye17 commented 11 months ago

Panel Cleaner's OCR has an issue with adding text that is nonexistent to the original image file.

It seems to add them at the tail end of the sentence sequence.

This is what Panel Cleaner's output straight from the text file:

102-2.png: まとめてーぶっ飛ばせー! こんななんでもありバトルロイヤルビーチフラッグだとは聞いてませんよ!? あはは!だから面白いんじゃない♪ ちなみに賞品はなんです? なんと~!指揮官とのデート券!! そうなんですか!チーム戦だから3枚? 残念~!1枚!お宝は早い者勝ちだかんね! そんな殺生な!? 優勝チーム内で更にバトルです...? ありがとうございました。はいはい。ですが、そのためにこの時期があります。

This should be the expected output:

102-2.png: まとめてーぶっ飛ばせー! こんななんでもありバトルロイヤルビーチフラッグだとは聞いてませんよ!? あはは!だから面白いんじゃない♪ ちなみに賞品はなんです? なんと~!指揮官とのデート券!! そうなんですか!チーム戦だから3枚? 残念~!1枚!お宝は早い者勝ちだかんね! そんな殺生な!? 優勝チーム内で更にバトルです...?

That last sentence string shouldn't even be there. I verified the text count using the isolated text function. It shows nine text boxes but has a tenth sentence added to it.

VoxelCubes commented 11 months ago

Thanks for the extra details you sent over email, I've now fixed the issue in release 2.1.1

The problem boiled down to the OCR model trying to OCR English text, which it isn't trained on, resulting in gibberish. That now gets ignored properly, with configurable strictness.

Thanks for bringing this to my attention!