Closed oep42 closed 5 months ago
Thank you.
The current implementation of the combining of very short subtitle lines with the previous subtitle line, creates textually better subtitles, but I'm afraid that the synchronization between the audio and the subtitles gets worse. Each of the four examples below, shows, from top to bottom:
The first three examples below, are about the combining of a very short subtitle line with the previous subtitle line. In general, the deterioration in synchronization with this (when comparing the output of beta 92 with the output of beta 151), seems to come down to this:
In addition, I noticed that the subtitle after the very short subtitle, starts later in SE's output than in Whisper's output, and this makes the synchronization worse there. (Sometimes this later start is only a little bit later, sometimes clearly more.)
Also, I have the impression that, perhaps, in general, the synchronization sometimes gets worse in SE's post-processing where I do not see a reason for such a deterioration. In the fourth example below, the content of two subtitles remains textually exactly the same before and after SE's post-processing, but there is, nevertheless, a deterioration in synchronization.
(CPP engine. Checkmark at "Auto adjust timings" and at "Use post-processing". Max. chars per subtitle line = None. Perhaps the results will be different when using a different engine and when "Max. chars per subtitle line" has a different value.)
1
2
3
4
Additional thougts about this suggestion:
I think this suggestion for additional post-processing only makes sense, if the total length of the very short subtitle line plus the length of the previous subtitle line, is less than the maximum total length of a subtitle line (as it is known in SE).
Also, more generally, I think it's Whisper's responsibility during the transcription phase, to create balanced subtitle lines, so that it doesn't produce such short subtitle lines that "belong" to the previous subtitle line.
In general, if the timestamps created during the transcription phase are good, it is not a good idea to change or create timestamps during the automatic post-processing phase. For example, SE can reliably merge short subtitle lines, but cannot reliably split long subtitle lines. So, it is generally not a good idea to use the "Split long lines" option of SE's "Batch convert", as this will usually cause timestamp errors in the resulting split.
In the latest version of WhisperX, it is possible to specify the maximum line length. If this option is used, long lines will be split during the transcription phase, and no timestamp errors will be introduced when such splits are created.
SEs post-processing of a Whisper transcription is great, but there is something that can be improved.
See the two examples below. The version on the left in the image is taken from a temporary srt-file that contains the Whisper output. The version on the right is the result of SEs post-processing. (These examples are taken from a test run at the end of Issue #6839.)
Sometimes a Whisper transcription contains a very short subtitle line that is the end of a sentence, and this is kept that way in SEs post-processing. Such short subtitle lines can best be combined with the previous subtitle line. I suggest to add to SEs post-processing, to combine such short subtitle lines with the previous subtitle line.