Open emk opened 6 months ago
The main concerns I would have with either 1 or 2 is that a lot of videos legitimately have overlapping subtitles, because there are multiple speakers simultaneously (e.g. a TV broadcaster in the background while another character is speaking). In some cases, the 'secondary' subtitle has some unique formatting that could be used to identify it and then treat it essentially as a separate track, but this would need to be detected per file (or implement a bunch of common patterns). For example, in Japanese, Netflix will generally display one subtitle on the bottom (like normal) and a secondary subtitle on the right (vertically). In English I've seen italic used for the secondary subtitle or even different colors (in .ass subtitles).
I haven't looked at your alignment algorithm and I haven't actually tried to implement one myself yet. I was going to start with something pretty simple - iterating over the native (base) subtitle items and aligning the reference subtitles when they have > some % overlap with the native subtitle item (maybe 90%+) by default, with a more relaxed match if there aren't any overlapping subtitles in each track. This is probably naive though 😅
The core substudy algorithms are all designed around non-overlapping subtitles. There's a built-in "cleaning" layer that will fix small overlaps as best as it can. But a few SRT files use partially overlapping subs to convey semantic and timing information, and other SRT files contain lots of garbage data.
What should we do here? Major options include:
I am honestly not too interested in pursuing (3) if I can possibly get good results (for most use cases) without it. But (1) vs (2) is a harder tradeoff and I'd love feeback on what people are encountering in their SRT files.
CC @aaron-meyers