Open timj opened 3 hours ago
Oh, and if I take that VLC or SubTools image and ask Apple Preview to copy and paste the text I get:
and I'm drawing Wolverine slashing prices with his adamantine claws.
which is perfect.
I should have tried this before, but if I copy and paste the text of the slightly fuzzy image from macSubtitleOCR that also returns a perfect version of the text...
I should have tried this before, but if I copy and paste the text of the slightly fuzzy image from macSubtitleOCR that also returns a perfect version of the text...
Ok, that bit is very interesting. I wonder how they have their settings tuned, because presumably it uses the same api.
I'll try and investigate this one tonight, hopefully it's a simple fix.
Definitely interesting that you are getting a different answer from the same image using the same API. It's clear the image is fuzzier but there is hope!
Hope indeed, I'm wondering if I should just give up on the idea behind implementing my own decoders and just use an external library like FFmpeg, but I just really don't want to have to have any external dependencies
Trying a different disk from #20 and whilst the image data looks close it seems there is a slight blurriness to it that is messing up the OCR.
eg
with subtools looking like:
and VLC:
and the macSubtitleOCR subtitle comes out as:
vs SubTools with Tesseract:
That's a "good" example. Some other ones are much less usable. SubTools is much better though.
Here are the VOBSUB files:
fuzz.tgz