ecdye / macSubtitleOCR

Convert bitmap subtitles into SubRip format using the macOS Vision framework
MIT License
14 stars 1 forks source link

Another VOBSUB crash #19

Closed timj closed 1 month ago

timj commented 1 month ago

New crash with VOBSUB:

$ macSubtitleOCR crash.idx .
Swift/Repeat.swift:40: Fatal error: Repetition count should be non-negative
zsh: trace trap  macSubtitleOCR crash.idx .

crash2.tar.gz

ecdye commented 1 month ago

Alright, I'll take a look. You know it's funny, I would have thought that there would have been more issues with decoding PGS than VobSub as there are more examples of VobSub, but I guess that's not the case.

timj commented 1 month ago

Thanks. That does fix the crash but I'm not sure what's going on with the OCR because it's unusable.

1
00:00:06,006 --> 00:00:09,666
NAF RATOR

2
00:00:07,807 --> 00:00:13,056
the people are represented
by two esperata

3
00:00:11,744 --> 00:00:14,769
the palle
who live tig 9 crime

4
00:00:13,647 --> 00:00:17,821
and the dil that afterueya
who proceante the allendara

Tesseract manages to get the opening 4 subtitles almost perfectly but hardly any of this makes sense.

It's meant to be:

NARRATOR:
In the criminal justice system

the people are represented
by two separate
yet equally important groups,

the police
who investigate crime

and the district attorneys
who prosecute the offenders.

Does it work for you?

timj commented 1 month ago

It looks like there is an issue with the image rendering. Saving the images (that's a great debug feature) gives me something like:

subtitle_4

whereas the SubTools preview gives me:

image
ecdye commented 1 month ago

Oh fun, I guess there's another bug to hunt down. And yes the images output doubles as a both an OCR sanity check and a wonderful debugging tool, that's half the reason I included it, because if I didn't, I would never haven been able to figure out the decoding in the first place.

Thanks for your help in testing this, it's been super helpful in pushing the project forward and making it more complete!

timj commented 1 month ago

I don't know anything about VOBSUB but is it possible that these files are interlaced and the deinterlacing isn't working properly? It looks like there's a vertical shift between two frames.

ecdye commented 1 month ago

Yeah, it's an interlacing issue, my current decoder doesn't handle it very well. I'm working on a solution but it still has a couple minor issues to resolve.