ecdye / macSubtitleOCR

Convert bitmap subtitles into SubRip format using the macOS OCR engine
MIT License
6 stars 1 forks source link

Add initial support for VobSub subtitles #11

Closed ecdye closed 3 days ago

ecdye commented 1 week ago

This will only add support for VobSub stream decoding for the moment. In the future additional support will be implemented for extracting the stream from an input .mkv file like has been done with PGS subtitle streams already.

For now this focuses on simply adding functionality for the basic VobSub format without any extra bells or whistles.

ecdye commented 4 days ago

@timj If you don't mind, before I merge, would you try running this on a handful of you VobSub subtitle format files. This PR won't support reading them from Matroska files, but I will add that in a future PR. I just want to see how accurate this is for more than just the handful of files I have thrown at it.

github-advanced-security[bot] commented 4 days ago

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

timj commented 4 days ago

I will be happy to take a look this evening.

timj commented 4 days ago

First attempt:

$ macSubtitleOCR subs.idx . 
macSubtitleOCR/VobSubParser.swift:134: Fatal error: Unexpectedly found nil while unwrapping an Optional value
zsh: trace trap  macSubtitleOCR subs.idx .

with subs.tar.gz

ecdye commented 3 days ago

Whelp, this is why I'm glad I waited to merge. I'll try to look into this again this evening and see if I can figure out what I need to do to fix it.

ecdye commented 3 days ago

Alright, I think I fixed that file at the very least, give it a try now @timj. I do have to say, I appreciate your taste in movies based off of that subtitle file.

timj commented 3 days ago

Thanks. That works for me. Much better quality than I get out of Subtools with Tesseract so that's a bonus. I can try some more examples over the weekend.

ecdye commented 3 days ago

Thanks. That works for me. Much better quality than I get out of Subtools with Tesseract so that's a bonus. I can try some more examples over the weekend.

Glad to hear it! That's actually what drew me to want to do this originally, the fact that the macOS OCR API seems to be really good.