Kethsar / ytarchive

Garbage Youtube livestream downloader
MIT License
1.13k stars 91 forks source link

Segment downloads 403 after 30s, requiring frequent re-extraction #221

Open fren-archive opened 1 week ago

fren-archive commented 1 week ago

I and several others on Discord are seeing frequent 403s while downloading segments, though some users reported they are not seeing this behavior. Specifically it happens 30s after each page extraction, meaning each instance of ytarchive needs to extract the URLs 120 times per hour instead of the 1 time that is intended. The output with --debug set will look like this:

...
2024/09/25 03:45:33 DEBUG: audio3: HTTP Error for fragment 685: 403 Forbidden
2024/09/25 03:45:33 DEBUG: audio: Attempting to retrieve a new download URL
2024/09/25 03:45:33 DEBUG: audio2: HTTP Error for fragment 686: 403 Forbidden
2024/09/25 03:45:33 DEBUG: audio1: HTTP Error for fragment 687: 403 Forbidden
2024/09/25 03:45:34 DEBUG: Retrieving URLs from web DASH manifest
2024/09/25 03:45:34 DEBUG: Retrieving URLs from web adaptive formats
2024/09/25 03:46:03 DEBUG: video3: HTTP Error for fragment 770: 403 Forbidden
2024/09/25 03:46:03 DEBUG: video: Attempting to retrieve a new download URL
2024/09/25 03:46:04 DEBUG: Retrieving URLs from web DASH manifest
2024/09/25 03:46:04 DEBUG: Retrieving URLs from web adaptive formats
2024/09/25 03:46:33 DEBUG: video4: HTTP Error for fragment 1025: 403 Forbidden
2024/09/25 03:46:33 DEBUG: video: Attempting to retrieve a new download URL
2024/09/25 03:46:33 DEBUG: video3: HTTP Error for fragment 1024: 403 Forbidden
2024/09/25 03:46:34 DEBUG: Retrieving URLs from web DASH manifest
2024/09/25 03:46:34 DEBUG: Retrieving URLs from web adaptive formats
2024/09/25 03:47:03 DEBUG: video4: HTTP Error for fragment 1276: 403 Forbidden
2024/09/25 03:47:03 DEBUG: video: Attempting to retrieve a new download URL
2024/09/25 03:47:04 DEBUG: Retrieving URLs from web DASH manifest
2024/09/25 03:47:04 DEBUG: Retrieving URLs from web adaptive formats
...

With just 3-5 instances I was able to trigger bot detection measures, which caused the recordings to all fail nearly simultaneously. There are other effects, such as that content which is removed or set members-only will fail almost immediately instead of being able to continue for 6h. Passing cookies did not change the behavior.

This is, as far as I know, tied to the android client and potoken. When potoken is enabled (which it is on android client in most cases) URLs fail after 30s unless they have the proper value of pot. This is similar to nsig calculation but substantially worse because it relies on fingerprinting methods rather than a simple javascript check, and probably cannot reasonably be emulated outside a browser.

I have the same issue with yt-dlp if I try to use android or web clients, but ios and web_creator clients are not (yet) affected. web_creator also provides dash manifest urls which do not need nsig calculation so that seems like the easiest path to avoid the issue. Alternatively an option for the user to offload URL extraction to an external script (presumably yt-dlp) might be more robust long-term.

Kethsar commented 1 week ago

I knew this would start eventually. I don't have the drive the yt-dlp people do to fight it much either. I might try updating which clients are used for requests, but once they inevitably all get hit, I'll probably just stop bothering. Honestly surprised this has worked as long as it has anyway.