TheFrenchGhosty / TheFrenchGhostys-Ultimate-YouTube-DL-Scripts-Collection

The ultimate collection of scripts for YouTube-DL.
GNU General Public License v3.0
2.29k stars 124 forks source link

Enforce title/uploader for embedded metadata to avoid Content ID tags overwriting them #75

Closed msikma closed 2 years ago

msikma commented 3 years ago

I recently learned that yt-dlp will actually write the track name as the metadata title for videos, if Youtube found a music track in the video. It also uses the artist name instead of the uploader in that case.

This seems like it'd be useful for people who want to use yt-dlp to rip music, but for archiving it seems bad to me, especially video. An hours long video that happens to have a song in it shouldn't be named after that song, it should retain its video title.

I only noticed this when I found that, in an archiving project I did, some of the videos had a totally incorrect title/artist set in the metadata, but not all. In fairness, it's mentioned in the readme, but it's quite opaque and difficult to grasp exactly what it does (especially the part about the info.json metadata not necessarily being the same as the embedded file metadata).

To prevent this from happening, add the following arguments:

These are the only two that get overwritten by Content ID tags. Adding these lines ensure that only the title and the uploader are used as source for embedded metadata.

TheFrenchGhosty commented 2 years ago

@msikma Do you have an example URL where the current scripts make this a problem?

msikma commented 2 years ago

Yes, this video for example. If you download it without my given --parse-metadata argument, the title of the video will be the Content ID song name. If you do add the argument, you should get "TITLE TEST" as expected.

See also this issue I opened on yt-dlp. I had assumed that this was a bug, also because the documentation is a little opaque and hard to figure out, but ultimately it was deemed not something that needs fixing.

It's very strange and I don't agree with their reasoning, but basically it's designed to be helpful to people who are downloading Youtube videos just to get music, and if you're downloading for archiving purposes you don't really want this.

In my case I downloaded an archive of a professional video game tournament, and then suddenly discovered that files would randomly either have the proper title or they'd be named after one of the songs used in it to introduce the players depending on whether they were found by Content ID.

TheFrenchGhosty commented 2 years ago

@msikma I just tried your example, and it downloads with the correct title, without adding anything.

msikma commented 2 years ago

It should get the correct filename, but not the correct embedded metadata title. The problem only occurs when you have --add-metadata in the command and look at the metadata it added to the file after downloading.

The relevant part of the ffprobe output:

yt-dlp --add-metadata "https://www.youtube.com/watch?v=M38GIXosXF0"
Input #0, matroska,webm, from 'TITLE TEST [M38GIXosXF0].webm':
  Metadata:
    title           : Sonata No. 2 in A Major, Op. 2, No. 2: III. Scherzo - Allegretto (Remastered)
    COMMENT         : https://www.youtube.com/watch?v=M38GIXosXF0
    HTTP://YOUTUBE.COM/STREAMING/OTF/DURATIONS/112015: Segment-Count: 28 
                    : Segment-Durations-Ms: 5333,5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333,6000, 
                    :  
                    : 
    ARTIST          : Glenn Gould
    DATE            : 20180608
    DESCRIPTION     : DESCRIPTION TEST
    SYNOPSIS        : DESCRIPTION TEST
    PURL            : https://www.youtube.com/watch?v=M38GIXosXF0
    ENCODER         : Lavf58.76.100
  Duration: 00:02:30.56, start: -0.007000, bitrate: 307 kb/s

Whereas when adding the additional commands, you get this:

yt-dlp --add-metadata --parse-metadata "%(title)s:%(meta_title)s" --parse-metadata "%(uploader)s:%(meta_artist)s" "https://www.youtube.com/watch?v=M38GIXosXF0"
Input #0, matroska,webm, from 'TITLE TEST [M38GIXosXF0].webm':
  Metadata:
    title           : TITLE TEST
    COMMENT         : https://www.youtube.com/watch?v=M38GIXosXF0
    HTTP://YOUTUBE.COM/STREAMING/OTF/DURATIONS/112015: Segment-Count: 28 
                    : Segment-Durations-Ms: 5333,5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333(r=1),5334,5333,6000, 
                    :  
                    : 
    ARTIST          : helloitismetomato
    DATE            : 20180608
    DESCRIPTION     : DESCRIPTION TEST
    SYNOPSIS        : DESCRIPTION TEST
    PURL            : https://www.youtube.com/watch?v=M38GIXosXF0
    ENCODER         : Lavf58.76.100
  Duration: 00:02:30.56, start: -0.007000, bitrate: 307 kb/s
TheFrenchGhosty commented 2 years ago

@msikma Alright, thank you! This is definitely a problem, I'll fix it soon.