azihassan / youtube-d

A fast command-line Youtube downloader
MIT License
8 stars 0 forks source link

Range is now mandatory for adaptive formats #83

Open azihassan opened 4 weeks ago

azihassan commented 4 weeks ago

Initially reported here by bajas. I reproduced it with youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ. The response is "Failed with status 403".

Getting the video URL with youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ -o then opening it in a browser or downloading it with cURL yields similar results.

It's worth noting that while ParallelDownloader adds range headers that bypass this issue, it will still fail because it first send a HEAD request to the video URL in order to retrieve its full length in order to calculate appropriate range values from it. But this HEAD call also fails because it doesn't include a range header. This can be reproduced with curl -LI $(youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ -o | grep '^https')

$ curl -LI $(youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ -o | grep '^https')
HTTP/1.1 403 Forbidden
Last-Modified: Wed, 02 May 2007 10:26:10 GMT
Content-Type: text/plain
Content-Length: 0
Connection: close
Vary: Origin
Cross-Origin-Resource-Policy: cross-origin
X-Restrict-Formats-Hint: None
X-Content-Type-Options: nosniff
Date: Sat, 17 Aug 2024 19:52:07 GMT
Server: gvs 1.0
azihassan commented 3 weeks ago

I forgot that video size is included in the adaptiveFormats JSON object, that's how it's displayed when using the -F flag. Instead of sending a HEAD request for that end, I'll have to pass the video size to the Downloader classes.

azihassan commented 3 weeks ago

The youtube website doesn't seem to be using adaptiveFormats at all. The URL it requests no longer mentions an itag query parameter. The response content type is now "application/vnd.yt-ump".

Example URL :

https://rr1---sn-f5o5-jho6.googlevideo.com/videoplayback?expire=1724395416&ei=ONvHZuPmCYylp-oPoI2l4Qk&ip=105.66.6.5&id=o-AAF6xHbbhqMQz4PivvKCrqekqJxpvv4yPOhC5C7ZzgxR&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=cZ&mm=31%2C29&mn=sn-f5o5-jho6%2Csn-h5qzen7d&ms=au%2Crdu&mv=m&mvi=1&pl=24&initcwndbps=281250&spc=Mv1m9nRBwBVvaxcx0avTGA74bTsY58w2aDWe3FdwYQQcC0oUXgmwrXVP13D_&svpuc=1&ns=QDPXqHeZ7mc9HGOvPacKq2cQ&sabr=1&rqh=1&mt=1724373168&fvip=2&keepalive=yes&c=WEB&n=8v1P1FtKQzxTfw&sparams=expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cxpc%2Cspc%2Csvpuc%2Cns%2Csabr%2Crqh&sig=AJfQdSswRQIgae4qYOMQAw8tmnrDJnjxhSQZedUGiLYyaysFG_r0a4oCIQD6jDjHVXpz9Jn6G573mQTVsC6GMTJ_HzKIKrRNGqUCJQ%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AGtxev0wRgIhANYc6PItRU88qizdK3VgOdwXt1Tu1IwAe0AzY2Ec8DPfAiEA38HbtgXfrGROhfNFlCzeaHrkptFA3SD4IEqdAOwFavk%3D&cpn=wkI2twug9Q_Ddhju&cver=2.20240821.01.00&rn=1

I noticed that there's a new field called "serverAbrStreamingUrl" in "streamingData" that includes a similar URL :

https://rr1---sn-f5o5-jho6.googlevideo.com/videoplayback?expire=1724395579&ei=29vHZvGhI8uIvdIPuc2LyAQ&ip=105.66.6.5&id=o-AKeh3bUw03Y5en6DnNTitoYbkkNVuJr8T79j7feEIwPA&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=cZ&mm=31%2C29&mn=sn-f5o5-jho6%2Csn-h5q7knes&ms=au%2Crdu&mv=m&mvi=1&pl=24&initcwndbps=287500&spc=Mv1m9gTufCB5bea62majXWG3BkV2lbgeeP0yBck_gTWGwnUN-ZgxO7Qf-Wta&svpuc=1&ns=kw1DbH1GcIKnPQ4DmD9cWg8Q&sabr=1&rqh=1&mt=1724373647&fvip=1&keepalive=yes&c=WEB&n=sw0ivBXGfZIJ_T7X&sparams=expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cxpc%2Cspc%2Csvpuc%2Cns%2Csabr%2Crqh&sig=AJfQdSswRAIgH3hxz4U9wnewpzeTlOxt6XE009QS3zklHbNfKGWbsdcCIDGt7iV1b1HcKCFbstNGR7CKBmZW2zVYZx4I0Obtzxqh&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AGtxev0wRAIgf5N3HKeVNx3iq-0pS9CJisFDUpheBLBe00naXtNK8zcCICtEPBFTP9dC0byNYm7dnvh_azhKhT44tq5rJedkiHNd

Here's a side by side comparison :

image

The video URL has three additional parameters. cver is available in the HTML page, rn is what I assume to be the new format identifier (itag), but I don't know what cpn means other than the fact it's referred to as clientPlaybackNonce in base.js.

Querying serverAbrStreamingUrl returns 403. After solving the n parameter, it returns a 200 OK response with a binary output that includes "sabr.malformed_config". The same result can be obtained by removing any of the three cver/cpn/rn parameters from the video URL reported by the browser's network explorer.

azihassan commented 3 weeks ago

I did some digging into yt-dlp and found out that certain user agents return an HLS stream URL in streamingData. It returns an m3u8 file that contains URLs pointing to other m3u8 files, each of which point to a video segment.

Here's a 144p example :

#EXTM3U
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=147666,CODECS="mp4a.40.5,avc1.4D400C",RESOLUTION=256x144,FRAME-RATE=24,VIDEO-RANGE=SDR,CLOSED-CAPTIONS=NONE
https://manifest.googlevideo.com/api/manifest/hls_playlist/expire/1724485063/ei/ZznJZoC5KKjJ...D%3D/playlist/index.m3u8
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-TARGETDURATION:8
#EXTINF:4.129125,
https://rr1---sn-f5o5-jhod.googlevideo.com/videoplayback/id/454f64110cd30074/itag/91/.../0/file/seg.ts
#EXTINF:6.339666,
https://rr1---sn-f5o5-jhod.googlevideo.com/videoplayback/id/454f64110cd30074/itag/91/.../1/file/seg.ts
azihassan commented 1 week ago

The URLs of the HLS stream have two clen attributes. I assumed that they refer to the video and audio content lengths, but downloading the URLs of the m3u8 playlist (in sequence) yield a slightly larger video. Not sure why, maybe I'm not downloading them correctly.