azihassan / youtube-d

A fast command-line Youtube downloader
MIT License
7 stars 0 forks source link

Range is now mandatory for adaptive formats #83

Open azihassan opened 3 months ago

azihassan commented 3 months ago

Initially reported here by bajas. I reproduced it with youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ. The response is "Failed with status 403".

Getting the video URL with youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ -o then opening it in a browser or downloading it with cURL yields similar results.

It's worth noting that while ParallelDownloader adds range headers that bypass this issue, it will still fail because it first send a HEAD request to the video URL in order to retrieve its full length in order to calculate appropriate range values from it. But this HEAD call also fails because it doesn't include a range header. This can be reproduced with curl -LI $(youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ -o | grep '^https')

$ curl -LI $(youtube-d -f 399 https://youtu.be/dQw4w9WgXcQ -o | grep '^https')
HTTP/1.1 403 Forbidden
Last-Modified: Wed, 02 May 2007 10:26:10 GMT
Content-Type: text/plain
Content-Length: 0
Connection: close
Vary: Origin
Cross-Origin-Resource-Policy: cross-origin
X-Restrict-Formats-Hint: None
X-Content-Type-Options: nosniff
Date: Sat, 17 Aug 2024 19:52:07 GMT
Server: gvs 1.0
azihassan commented 3 months ago

I forgot that video size is included in the adaptiveFormats JSON object, that's how it's displayed when using the -F flag. Instead of sending a HEAD request for that end, I'll have to pass the video size to the Downloader classes.

azihassan commented 3 months ago

The youtube website doesn't seem to be using adaptiveFormats at all. The URL it requests no longer mentions an itag query parameter. The response content type is now "application/vnd.yt-ump".

Example URL :

https://rr1---sn-f5o5-jho6.googlevideo.com/videoplayback?expire=1724395416&ei=ONvHZuPmCYylp-oPoI2l4Qk&ip=105.66.6.5&id=o-AAF6xHbbhqMQz4PivvKCrqekqJxpvv4yPOhC5C7ZzgxR&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=cZ&mm=31%2C29&mn=sn-f5o5-jho6%2Csn-h5qzen7d&ms=au%2Crdu&mv=m&mvi=1&pl=24&initcwndbps=281250&spc=Mv1m9nRBwBVvaxcx0avTGA74bTsY58w2aDWe3FdwYQQcC0oUXgmwrXVP13D_&svpuc=1&ns=QDPXqHeZ7mc9HGOvPacKq2cQ&sabr=1&rqh=1&mt=1724373168&fvip=2&keepalive=yes&c=WEB&n=8v1P1FtKQzxTfw&sparams=expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cxpc%2Cspc%2Csvpuc%2Cns%2Csabr%2Crqh&sig=AJfQdSswRQIgae4qYOMQAw8tmnrDJnjxhSQZedUGiLYyaysFG_r0a4oCIQD6jDjHVXpz9Jn6G573mQTVsC6GMTJ_HzKIKrRNGqUCJQ%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AGtxev0wRgIhANYc6PItRU88qizdK3VgOdwXt1Tu1IwAe0AzY2Ec8DPfAiEA38HbtgXfrGROhfNFlCzeaHrkptFA3SD4IEqdAOwFavk%3D&cpn=wkI2twug9Q_Ddhju&cver=2.20240821.01.00&rn=1

I noticed that there's a new field called "serverAbrStreamingUrl" in "streamingData" that includes a similar URL :

https://rr1---sn-f5o5-jho6.googlevideo.com/videoplayback?expire=1724395579&ei=29vHZvGhI8uIvdIPuc2LyAQ&ip=105.66.6.5&id=o-AKeh3bUw03Y5en6DnNTitoYbkkNVuJr8T79j7feEIwPA&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=cZ&mm=31%2C29&mn=sn-f5o5-jho6%2Csn-h5q7knes&ms=au%2Crdu&mv=m&mvi=1&pl=24&initcwndbps=287500&spc=Mv1m9gTufCB5bea62majXWG3BkV2lbgeeP0yBck_gTWGwnUN-ZgxO7Qf-Wta&svpuc=1&ns=kw1DbH1GcIKnPQ4DmD9cWg8Q&sabr=1&rqh=1&mt=1724373647&fvip=1&keepalive=yes&c=WEB&n=sw0ivBXGfZIJ_T7X&sparams=expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cxpc%2Cspc%2Csvpuc%2Cns%2Csabr%2Crqh&sig=AJfQdSswRAIgH3hxz4U9wnewpzeTlOxt6XE009QS3zklHbNfKGWbsdcCIDGt7iV1b1HcKCFbstNGR7CKBmZW2zVYZx4I0Obtzxqh&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AGtxev0wRAIgf5N3HKeVNx3iq-0pS9CJisFDUpheBLBe00naXtNK8zcCICtEPBFTP9dC0byNYm7dnvh_azhKhT44tq5rJedkiHNd

Here's a side by side comparison :

image

The video URL has three additional parameters. cver is available in the HTML page, rn is what I assume to be the new format identifier (itag), but I don't know what cpn means other than the fact it's referred to as clientPlaybackNonce in base.js.

Querying serverAbrStreamingUrl returns 403. After solving the n parameter, it returns a 200 OK response with a binary output that includes "sabr.malformed_config". The same result can be obtained by removing any of the three cver/cpn/rn parameters from the video URL reported by the browser's network explorer.

azihassan commented 3 months ago

I did some digging into yt-dlp and found out that certain user agents return an HLS stream URL in streamingData. It returns an m3u8 file that contains URLs pointing to other m3u8 files, each of which point to a video segment.

Here's a 144p example :

#EXTM3U
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=147666,CODECS="mp4a.40.5,avc1.4D400C",RESOLUTION=256x144,FRAME-RATE=24,VIDEO-RANGE=SDR,CLOSED-CAPTIONS=NONE
https://manifest.googlevideo.com/api/manifest/hls_playlist/expire/1724485063/ei/ZznJZoC5KKjJ...D%3D/playlist/index.m3u8
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-TARGETDURATION:8
#EXTINF:4.129125,
https://rr1---sn-f5o5-jhod.googlevideo.com/videoplayback/id/454f64110cd30074/itag/91/.../0/file/seg.ts
#EXTINF:6.339666,
https://rr1---sn-f5o5-jhod.googlevideo.com/videoplayback/id/454f64110cd30074/itag/91/.../1/file/seg.ts
azihassan commented 2 months ago

The URLs of the HLS stream have two clen attributes. I assumed that they refer to the video and audio content lengths, but downloading the URLs of the m3u8 playlist (in sequence) yield a slightly larger video. Not sure why, maybe I'm not downloading them correctly.

3052 commented 2 months ago

hey I am working on this as well

I am able to get around the sabr.malformed_config error, but only by using a large ProtoBuf request body similar to the web client. not sure how its created yet

azihassan commented 1 month ago

Thanks! I sort of gave up on that angle until I get HLS streams to work, but even that isn't without its problems.

azihassan commented 2 weeks ago

Thanks to bajas' iOS user agent tip, I was able to get adaptive formats to work again, even without providing a range.

Edit: range is still needed to bypass rate limiting for large files. It can be provided with the --chunked or --parallel flags.