deniscerri / ytdlnis

Android Video/Audio Downloader app using yt-dlp
GNU General Public License v3.0
2.93k stars 99 forks source link

[FEATURE REQUEST] Fix YouTube's autogenerated subtitles doubling #433

Open ershovev opened 3 months ago

ershovev commented 3 months ago

Is your feature request available in yt-dlp? Please describe. Not available.

When you download automatic subtitles from YouTube, the resulting subtitle is a rolling subtitle - every time a new line is added, the previous one is moved up a line - if there's more than two lines, the first one disappears. Think Star Wars intro, but with only two lines:

A subtitle converted from VTT to SRT by yt-dlp would look something like this

00:00 --> 00:03 
This is the first line

00:03 --> 00:10 
This is the first line
This is what happens when another line is added

00:10
This is what happens when another line is added
If a third one is added, the first one disappears and the second one shoots up

The problem with this is that it's really hard to read, since you expect both lines to change, and it becomes really distracting.

Describe the solution you'd like Maybe some flag "Fix YouTube autogenerated subtitles doubling" in settings?

Users at github and superuser.com suggests some fixes for ytdl: 1)


def fix_youtube_vtt(vtt_file_path) -> str:
    """Fixes Youtube's autogenerated VTT subtitles and returns a srt-formatted string"""

    import webvtt

    pretty_subtitle = ''  
    previous_caption_text = ''
    i = 1
    for caption in webvtt.read(vtt_file_path):

        if previous_caption_text == caption.text.strip():
            # if previous and current lines are `identical`, print the start time from the previous
            # and the end time from the current.
            pretty_subtitle += f"{i}\n{previous_caption_start} --> {caption.end}\n{previous_caption_text}\n\n"
            i += 1

        elif previous_caption_text == caption.text.strip().split("\n")[0]: 
            # if the current caption is multiline, and the previous caption is equal to 
            # the current's first line, just ignore the first line and move on with the second.
            previous_caption_text = caption.text.strip().split("\n")[1]
            previous_caption_start = caption.start
            last_caption_end = caption.end

        else:       
            previous_caption_text = caption.text.strip()
            previous_caption_start = caption.start.strip()

    return pretty_subtitle

2) yt-dlp --embed-subs --merge-output-format mkv -f 'bv+ba' --write-auto-subs --sub-langs 'en' 'https://youtu.be/3_HG33-IYaY' --sub-format ttml --convert-subs srt --exec 'before_dl:fn=$(echo %(_filename)s| sed "s/%(ext)s/en.srt/g") && ffmpeg -fix_sub_duration -i "$fn" -c:s text "$fn".tmp.srt && mv "$fn".tmp.srt "$fn"'

3)

function cleanVttFile($fileName, $outputName) {

    $lines = file($fileName);
    $headers = ['WEBVTT', 'Kind: captions', 'Language: en'];
    $modified_lines = [];
    $prev_line = "";

    foreach ($lines as $line) {
        // Skip headers
        if (in_array(trim($line), $headers)) {
            $modified_lines[] = $line;
            continue;
        }

        // Skip timestamp lines and blank lines
        if (preg_match('/\d{2}:\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}:\d{2}\.\d{3}.*/', $line) || trim($line) == "") {
            $modified_lines[] = $line;
            continue;
        }

        // Remove time tags
        $stripped_line = preg_replace('/<[^>]*>/', '', $line);

        // Compare with previous line
        if ($stripped_line != $prev_line || $prev_line == "") {
            $modified_lines[] = $line;
        }

        // Update previous line
        $prev_line = $stripped_line;
    }

    file_put_contents($outputName, $modified_lines);
}
zaednasr commented 3 months ago

@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this. I dont code the core ytdlp itself, just the android app interface of it.

ershovev commented 3 months ago

@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this. I dont code the core ytdlp itself, just the android app interface of it.

Got it, sorry

According to these topics, it seems that they are not planning to fix it

https://github.com/yt-dlp/yt-dlp/issues/6274 https://github.com/yt-dlp/yt-dlp/issues/1734