ecdye / macSubtitleOCR

Convert bitmap subtitles into SubRip format using the macOS OCR engine
MIT License
6 stars 1 forks source link

Another slight interlacing issue #21

Open timj opened 3 hours ago

timj commented 3 hours ago

Trying a different disk from #20 and whilst the image data looks close it seems there is a slight blurriness to it that is messing up the OCR.

eg

subtitle_11

with subtools looking like:

image

and VLC:

image

and the macSubtitleOCR subtitle comes out as:

11
00:00:47,719 --> 00:00:54,884
and Im crewing Welverhe slashing
piles with his adamantine claws.

vs SubTools with Tesseract:

and I'm drlwlng Wolverlne slashlng
prices with hls adamantlne claws.

That's a "good" example. Some other ones are much less usable. SubTools is much better though.

Here are the VOBSUB files:

fuzz.tgz

timj commented 3 hours ago

Oh, and if I take that VLC or SubTools image and ask Apple Preview to copy and paste the text I get:

and I'm drawing Wolverine slashing prices with his adamantine claws.

which is perfect.

timj commented 3 hours ago

I should have tried this before, but if I copy and paste the text of the slightly fuzzy image from macSubtitleOCR that also returns a perfect version of the text...

ecdye commented 3 hours ago

I should have tried this before, but if I copy and paste the text of the slightly fuzzy image from macSubtitleOCR that also returns a perfect version of the text...

Ok, that bit is very interesting. I wonder how they have their settings tuned, because presumably it uses the same api.

ecdye commented 3 hours ago

I'll try and investigate this one tonight, hopefully it's a simple fix.

timj commented 1 hour ago

Definitely interesting that you are getting a different answer from the same image using the same API. It's clear the image is fuzzier but there is hope!

ecdye commented 1 hour ago

Hope indeed, I'm wondering if I should just give up on the idea behind implementing my own decoders and just use an external library like FFmpeg, but I just really don't want to have to have any external dependencies