JuanBindez / pytubefix

Python3 library for downloading YouTube Videos.
http://pytubefix.rtfd.io/
MIT License
454 stars 67 forks source link

Fix WEB client #195

Closed felipeucelli closed 1 week ago

felipeucelli commented 3 weeks ago

Is your feature request related to a problem? Please describe.

Currently only stream 18 of the WEB client is working, the other streams return error 403.

In the past, the WEB client needed the signature cipher and signatureTimeStamp to work. The 'n' throttling parameter was optional and only caused a 30Kb/s transfer rate limit, but recently it became necessary.

Currently, even after deciphering all of them, the 403 error is still returned, showing that YouTube changed something else.

Describe the solution you'd like

yt-dlp is working to implement the proof of origin token (PoToken), but it requires external plugins.

I personally couldn't get it to work in pytubefix, maybe I'm doing something wrong.

Describe alternatives you've considered

Analyzing the official WEB client, I was unable to find the PoToken in the API request.

PoToken only appears in WEB_EMBED and WEB_MUSIC, and even removing the PoToken from this client's request, it is still possible to obtain valid streams.

I could be wrong but all this leads me to think that PoToken may not be responsible for the 403 error.

In the official WEB client request, YouTube seems to get the streams using the streamingData.serverAbrStreamingUrl url with the 'n' parameter decrypted, this url is sent through a POST with a request payload with a configuration converted to uint8array.

I managed to generate this uint8array manually, but I couldn't understand how it manages to get a specific itag. And when I send my configuration with serverAbrStreamingUrl, I can get a similar response to the official player.

Additional context

It seems that the WEB client is getting the streams through a POST sending some configurations (perhaps configurations that have the stream's itag).

As my attempts to use PoToken are failing, I would like the pytubefix community to also test the yt-dlp method and help fix the WEB client.

felipeucelli commented 2 weeks ago

Update

PoToken:

PoToken implementation in API request fixes YouTube bot detection error for WEB-based clients, but I have not yet been successful in fixing the 403 error in the WEB client.

To use PoToken you need to change the api request to:

 "WEB": {
    "innerTubeContext": {
      "context": {
        "client": {
          "clientName": "WEB",
          "osName": "Windows",
          "osVersion": "10.0",
          "clientVersion": "2.20240823.01.00",
          "platform": "DESKTOP",
          "visitorData": "VISITOR_DATA" # your visitorData here
        }
      },
      "serviceIntegrityDimensions": {
        "poToken": "PO_TOKEN" # your PoToken here
      }
    },
    "header": {
      "User-Agent": "Mozilla/5.0",
      "X-Youtube-Client-Name": "1",
      "X-Youtube-Client-Version": "2.20240823.01.00"
    },
    "apiKey": "AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8",
    "requireJsPlayer": "true"
  },

To get the poToken you can use this invidious tool. Although this tool does not work very well in a cloud environment, PoToken appears to be valid for several days.

The PoToken generated by BOTGUARD only works on web-based clients, but strangely YouTube sent me a visitorData that can be used on all clients without the need for the PoToken.

Error 403 in WEB client:

When generating a request payload with the url serverAbrStreamingUrl, YouTube returns fragments of the video in the format application/vnd.yt-ump which cannot be recognized by ffmpeg due to an error in the moov atom. In web clients that have not yet been affected, simply change the parameter ump=1 to ump=0 to get decodable fragments

It seems that YouTube is using a technique similar to streams in the past called yt_otf, which required downloading fragments through a POST, because even with the PoToken implemented or with my visitorData, it was still not possible to obtain a valid request with the GET method.

felipeucelli commented 1 week ago

yt-dlp was correct

To obtain functional adaptive flows, it was enough to include the same PoToken sent in the API, in the stream url with the pot query parameter.

I was using the invidious tool to obtain PoToken automatically, but I noticed that it returned erroneously generated tokens, so I switched to this project: https://github.com/YunzheZJU/youtube-po-token-generator.

Although the official WEB player does not use PoToken in the stream url, it is curious to think why it uses this request payload, perhaps there is a future change that awaits us.

To obtain the request payload, I isolated some base.js functions available here. Interestingly, the fake PoToken that the invidious returns originates from the isolated functions here.