SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.44k stars 893 forks source link

Suggestion to prevent very short subtitle lines in post-processing of Whisper transcription #6856

Closed oep42 closed 5 months ago

oep42 commented 1 year ago

SEs post-processing of a Whisper transcription is great, but there is something that can be improved.

See the two examples below. The version on the left in the image is taken from a temporary srt-file that contains the Whisper output. The version on the right is the result of SEs post-processing. (These examples are taken from a test run at the end of Issue #6839.)

Examples of too short subtitle lines

Sometimes a Whisper transcription contains a very short subtitle line that is the end of a sentence, and this is kept that way in SEs post-processing. Such short subtitle lines can best be combined with the previous subtitle line. I suggest to add to SEs post-processing, to combine such short subtitle lines with the previous subtitle line.

niksedk commented 1 year ago

OK, how is this? https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.12/SubtitleEditBeta.zip

oep42 commented 1 year ago

Thank you.

The current implementation of the combining of very short subtitle lines with the previous subtitle line, creates textually better subtitles, but I'm afraid that the synchronization between the audio and the subtitles gets worse. Each of the four examples below, shows, from top to bottom:

  1. The Whisper output (from a temporary srt-file).
  2. The result of SE's post-processing with "SubtitleEdit 3.6.12 NEXT, beta 92".
  3. The result of SE's post-processing with "SubtitleEdit 3.6.12 NEXT, beta 151".

The first three examples below, are about the combining of a very short subtitle line with the previous subtitle line. In general, the deterioration in synchronization with this (when comparing the output of beta 92 with the output of beta 151), seems to come down to this:

  1. What used to be a very short subtitle, keeps the same duration, but gets filled with much more text.
  2. The two subtitles that previously existed before the original very short subtitle, are combined into one subtitle.

In addition, I noticed that the subtitle after the very short subtitle, starts later in SE's output than in Whisper's output, and this makes the synchronization worse there. (Sometimes this later start is only a little bit later, sometimes clearly more.)

Also, I have the impression that, perhaps, in general, the synchronization sometimes gets worse in SE's post-processing where I do not see a reason for such a deterioration. In the fourth example below, the content of two subtitles remains textually exactly the same before and after SE's post-processing, but there is, nevertheless, a deterioration in synchronization.

(CPP engine. Checkmark at "Auto adjust timings" and at "Use post-processing". Max. chars per subtitle line = None. Perhaps the results will be different when using a different engine and when "Max. chars per subtitle line" has a different value.)

1 Example 1

2 Example 2

3 Example 3

4 Example 4

oep42 commented 1 year ago

Additional thougts about this suggestion:

I think this suggestion for additional post-processing only makes sense, if the total length of the very short subtitle line plus the length of the previous subtitle line, is less than the maximum total length of a subtitle line (as it is known in SE).

Also, more generally, I think it's Whisper's responsibility during the transcription phase, to create balanced subtitle lines, so that it doesn't produce such short subtitle lines that "belong" to the previous subtitle line.

In general, if the timestamps created during the transcription phase are good, it is not a good idea to change or create timestamps during the automatic post-processing phase. For example, SE can reliably merge short subtitle lines, but cannot reliably split long subtitle lines. So, it is generally not a good idea to use the "Split long lines" option of SE's "Batch convert", as this will usually cause timestamp errors in the resulting split.

In the latest version of WhisperX, it is possible to specify the maximum line length. If this option is used, long lines will be split during the transcription phase, and no timestamp errors will be introduced when such splits are created.