JuanBindez / pytubefix

Python3 library for downloading YouTube Videos.
http://pytubefix.rtfd.io/
MIT License
676 stars 95 forks source link

Can pytube be combined with UI to solve captcha when encountered? #220

Closed sleepingcat4 closed 1 month ago

sleepingcat4 commented 1 month ago

Hey, Juan

thanks for the nice project. I had been downloading a lot and what I have seen in every 1K batch only 100 videos get captcha and in every 10K videos 5K get captcha. I wanted to know if you can add features like

where I can choose best streaming server of YouTube (e.g., iOS, Android, Web etc) much like yt-dlp. And especially I wanted to know if it was possible pytubefix can be used in an UI where I can create a captcha solver so that I don't get captcha errors?

JuanBindez commented 1 month ago

do something like this:

from pytubefix import YouTube

url = ''

yt = YouTube(url, client='WEB')
stream = yt.streams.get_highest_resolution()
stream.download()
JuanBindez commented 1 month ago

As for the captcha I don't know

JuanBindez commented 1 month ago

image

sleepingcat4 commented 1 month ago

@JuanBindez that's fine. I can add a captcha solver or contribute to your project so it can be done. But, I wanted to if t was possible or can be added as a feature, where I can mix Pytube fix function with an UI so that I can write a logic for the captcha?

I know some captcha can even be solved on terminal via API (like Anti-Captacha APIs) is there a way where I can integrate an Anti-captacha function to pytubefix?

felipeucelli commented 1 month ago

Hello @sleepingcat4, Maybe I can help you, we recently added support for the Proof of Origin Token (PoToken) to pytubefix #209. I don't know of any other captcha method that YouTube uses.

Would you mind providing more information about this captcha?

sleepingcat4 commented 1 month ago

@felipeucelli I read a few PR of Pytubefix and it said most errors regarding

  1. login to view
  2. age restricted
  3. Failed to connect tunnel detected
  4. not detected

comes from a recaptcha error. I have been downloading at scale for days now using Pytube fix and what I learned this error gets more annoying as I increase the batch size.

For example in every 1K videos 100 videos receive this error but for every 10K videos 5K videos receive this error.

That's why, I was thinking if I could combine an Anti-recaptcha API with Pytubefix, it can automatically solve this verification issues and I can download it using terminal or use UI (but UI looks complicated)

felipeucelli commented 1 month ago

Were you able to detect any parameter, token or something similar, so that we could implement it in the pytubefix API?

How does this Anti-recaptcha API work?

sleepingcat4 commented 1 month ago

I was looking at this repo and some other paid services. These paid APIs look cheap, if you could have something native in Pytubefix (if paid still fine since those who badly need it can pay for it) then it will be fantastic.

https://github.com/anti-captcha/anticaptcha-python

I wasn't able to detect any token or parameter since the error was generic. But I will try to gather information next time.

felipeucelli commented 1 month ago

I tested your problem for several days and what I noticed is that it seems that YouTube is blocking it when several requests are made in a row. Needing a short break to continue.

When using a fast internet connection, I was blocked after about 5 requests. But when using a slower connection, I had no problem.

This is bad, but for now I don't know a way to get around this block.

Note: my tests were done without using po_token and oauth.