JuanBindez / pytubefix

Python3 library for downloading YouTube Videos.
http://pytubefix.rtfd.io/
MIT License
674 stars 95 forks source link

pytubefix.exceptions.VideoPrivate: lVFj91Z1AfM is a private video for all video from Digital Ocean #119

Closed in4sec-org closed 3 months ago

in4sec-org commented 3 months ago

Describe the bug I see pytubefix.exceptions.VideoPrivate: SOME_ID is a private video for all video from Digital Ocean Infra from this night. These are regular publicly available videos and not streams or live videos.

Desktop (please complete the following information):

Additional context I have a small pet project, before I downloaded a maximum of 3-5 videos per day.

potykion commented 3 months ago

Same:

 (most recent call last):
  ...
  File "/function/code/apps/bot_tg_potyk.py", line 112, in download_yt_as_mp3
    yt.streams.filter(only_audio=True).order_by("abr").first().stream_to_buffer(buffer)
    ^^^^^^^^^^
  File "/function/code/pytubefix/__main__.py", line 564, in streams
    self.check_availability()
  File "/function/code/pytubefix/__main__.py", line 324, in check_availability
    raise exceptions.VideoPrivate(video_id=self.video_id)
pytubefix.exceptions.VideoPrivate:  [91mSpH83KzVKDc is a private video [0m

Btw original pytube doesn't work as well https://github.com/pytube/pytube/issues/1973

JuanBindez commented 3 months ago

try this version to see what error appears pytubefix==6.3.4rc1

potykion commented 3 months ago

try this version to see what error appears pytubefix==6.3.4rc1

 (most recent call last):
  ...
  File "/function/code/apps/bot_tg_potyk.py", line 112, in download_yt_as_mp3
    yt.streams.filter(only_audio=True).order_by("abr").first().stream_to_buffer(buffer)
    ^^^^^^^^^^
  File "/function/code/pytubefix/__main__.py", line 534, in streams
    self.check_availability()
  File "/function/code/pytubefix/__main__.py", line 305, in check_availability
    raise exceptions.LoginRequired(video_id=self.video_id)
pytubefix.exceptions.LoginRequired:  [91mSpH83KzVKDc requires login to view [0m
JuanBindez commented 3 months ago

From my tests, it doesn't seem to be a problem with the library, it seems to be something with the video itself

potykion commented 3 months ago

Well it happens with any video like for that video https://www.youtube.com/watch?v=K4TOrB7at0Y:

 (most recent call last):
  ...t
  File "/function/code/apps/bot_tg_potyk.py", line 112, in download_yt_as_mp3
    yt.streams.filter(only_audio=True).order_by("abr").first().stream_to_buffer(buffer)
    ^^^^^^^^^^
  File "/function/code/pytubefix/__main__.py", line 534, in streams
    self.check_availability()
  File "/function/code/pytubefix/__main__.py", line 305, in check_availability
    raise exceptions.LoginRequired(video_id=self.video_id)
pytubefix.exceptions.LoginRequired:  [91mK4TOrB7at0Y requires login to view [0m
JuanBindez commented 3 months ago

image

potykion commented 3 months ago

Note that 91mK4TOrB7at0Y is not actual video id. This string is like fixed for various videos

Nevermind 91m is like reserved string: pytubefix.colors.Color.RED

JuanBindez commented 3 months ago

Could you provide me with the complete code you are testing?

potykion commented 3 months ago
from io import BytesIO

from pytubefix import YouTube

buffer = BytesIO()
(
    YouTube("https://www.youtube.com/watch?v=K4TOrB7at0Y")
    .streams
    .filter(only_audio=True)
    .order_by("abr")
    .first()
    .stream_to_buffer(buffer)
)
print(buffer.tell())
potykion commented 3 months ago

Well, the code works in the local environment, but it doesn't work in the cloud, such as Digital Ocean or Yandex Cloud

felipeucelli commented 3 months ago

YouTube may be blocking your remote IP, try using use_oauth and let us know the results.

Note: Don't use your main account to authenticate, YouTube may ban it.

JuanBindez commented 3 months ago

Well, the code works in the local environment, but it doesn't work in the cloud, such as Digital Ocean or Yandex Cloud

This may be the problem, as it is a cloud server there is no way of knowing the type of network infrastructure they have, there may be something in them that could cause blocking, due to their network infrastructure

in4sec-org commented 3 months ago

req.txt:

git+https://github.com/JuanBindez/pytubefix

code:

...
from pytubefix import YouTube
...

@app.task(name='transcribe_youtube_link', bind=True)
def transcribe_youtube_link(self, user_id, youtube_link, language='', isSpeakerDetectionEnabled=False, speakerMode='auto', speakerRange=None):
    runpod.api_key = os.getenv('RUNPOD_KEY')
    endpoint = runpod.Endpoint(os.getenv('RUNPOD_ENDPOINT'))
    task_id = str(self.request.id)
    mp4_file = f"static/{task_id}.mp4"
    new_file = f"static/{task_id}.wav"
    local_url = f"{os.getenv('BACKEND_PUBLIC_URL')}/video/{task_id}.mp4"
    # id: str, user_id: str, youtube_url: str, language: str = "English", is_speaker_detection_enabled: bool = False, speaker_mode: str = "auto", speaker_range: dict = {"min": 2, "max": 5}
    crud_create_task(id=task_id, user_id=user_id, youtube_url=local_url, language=language, is_speaker_detection_enabled=isSpeakerDetectionEnabled, speaker_mode=speakerMode, speaker_range=speakerRange)
    self.update_state(state='Creating')

    print(f"Parameters received: language={language}, isSpeakerDetectionEnabled={isSpeakerDetectionEnabled}, speakerMode={speakerMode}, speakerRange={speakerRange}")
    youtube_name = ''
    try:
        yt = YouTube(youtube_link)
        #print(yt)
        #full_seconds = yt.length
        #youtube_name = yt.title
        audio = yt.streams.filter(file_extension='mp4').first()
        if not audio:
            raise Exception("No suitable audio stream found.")

        status = 'Downloading'
        result = ''
        is_premium = crud_is_user_premium(user_id)
        #seconds = full_seconds if is_premium else min(full_seconds, 300)
        # task_id: str, status: str, result: str, seconds: int, youtube_name: str, updated_at: datetime, chunks=None, language: str = "English", is_speaker_detection_enabled: bool = False, speaker_mode: str = "auto", speaker_range: dict = {"min": 2, "max": 5}
        crud_update_task(task_id=task_id, status=status, result=result, seconds=0, youtube_name=youtube_name, updated_at=datetime.utcnow(), chunks={})
        self.update_state(state=status)

        out_file = audio.download()
        full_seconds = ffmpeg_probe_length(out_file)
        seconds = full_seconds if is_premium else min(full_seconds, 300)
        youtube_name = yt.title

        crud_update_task(task_id=task_id, status='Converting', result=result, seconds=seconds, youtube_name=youtube_name, updated_at=datetime.utcnow(), chunks={})
        self.update_state(state='Converting')

        # ffmpeg command to convert to low-quality mp4
        command = [
            'ffmpeg', '-i', out_file,
            '-b:v', '500k', '-s', '640x360', '-preset', 'fast', '-threads', '0'
        ]

        if not is_premium and full_seconds > 300:
            command.extend(['-t', '300'])  # Limit the output duration to 300 seconds

        command.append(mp4_file)

        subprocess.run(command, check=True)
        os.remove(out_file)

        # ffmpeg command to convert mp4 to wav
        command = [
            'ffmpeg', '-i', mp4_file, '-vn', '-acodec', 'pcm_s16le', '-ac', '1', '-ar', '16000', '-threads', '0', new_file
        ]
        subprocess.run(command, check=True)

        file_url = f"{os.getenv('BACKEND_PUBLIC_URL')}/{new_file}"
        crud_update_task(task_id=task_id, status='Transcribing', result=result, seconds=seconds, youtube_name=youtube_name, updated_at=datetime.utcnow(), chunks={})
        self.update_state(state='Transcribing')

        batch_size = 3 if not isSpeakerDetectionEnabled else 3

        run_request = endpoint.run({
            "input": {
                "audio": file_url,
                "batch_size": batch_size,
                "chunk_length": 30,
                "language": language if language else '',
                "diarise_audio": isSpeakerDetectionEnabled,
                "speaker_mode": speakerMode,
                "speaker_range": speakerRange
            }
        })
        result = run_request.output(timeout=60*60*3)

        os.remove(new_file)
        chunks = convert_transcript_format(result)
        chunks = json.dumps(chunks)
        crud_update_task(task_id=task_id, status='Finished', result=result["text"], seconds=seconds, youtube_name=youtube_name, updated_at=datetime.utcnow(), chunks=chunks)
        self.update_state(state='Finished')
        crud_update_user_seconds(user_id=user_id, seconds=seconds, updated_at=datetime.utcnow())
        return result
    except Exception as e:
        #youtube_name = ''
        crud_update_task(task_id=task_id, status='Error', result=str(e), seconds=0, youtube_name=youtube_name, updated_at=datetime.utcnow(), chunks={})
        self.update_state(state='Error')
        raise e

log:

worker_1    | [2024-07-19 13:51:48,071: INFO/MainProcess] Task transcribe_youtube_link[6db8bd6b-8181-4fb4-a472-15f80beed1db] received
backend_1   | INFO:     172.26.0.8:59440 - "POST /api/task HTTP/1.1" 201 Created
worker_1    | [2024-07-19 13:51:48,155: WARNING/ForkPoolWorker-1] Parameters received: language=English, isSpeakerDetectionEnabled=False, speakerMode=auto, speakerRange={'min': 3, 'max
': 5}
worker_1    | [2024-07-19 13:51:48,268: ERROR/ForkPoolWorker-1] Task transcribe_youtube_link[6db8bd6b-8181-4fb4-a472-15f80beed1db] raised unexpected: VideoPrivate('\x1b[91mbXzTXD_OJo0 
is a private video\x1b[0m')
worker_1    | Traceback (most recent call last):
worker_1    |   File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
worker_1    |     R = retval = fun(*args, **kwargs)
worker_1    |   File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
worker_1    |     return self.run(*args, **kwargs)
worker_1    |   File "/usr/src/app/tasks.py", line 190, in transcribe_youtube_link
worker_1    |     raise e
worker_1    |   File "/usr/src/app/tasks.py", line 120, in transcribe_youtube_link
worker_1    |     audio = yt.streams.filter(file_extension='mp4').first()
worker_1    |   File "/usr/local/lib/python3.10/dist-packages/pytubefix/__main__.py", line 564, in streams
worker_1    |     self.check_availability()
worker_1    |   File "/usr/local/lib/python3.10/dist-packages/pytubefix/__main__.py", line 324, in check_availability
worker_1    |     raise exceptions.VideoPrivate(video_id=self.video_id)
worker_1    | pytubefix.exceptions.VideoPrivate: bXzTXD_OJo0 is a private video

I tried with this link https://www.youtube.com/watch?v=bXzTXD_OJo0 just now.

Yesterday and a couple of months before that, it worked like clockwork (today I already made changes so that it would not take the name and duration from the library before the jump). I also tried moving the server from the UK to the US and Singapore (and other IPs), same result, and it broke in the last 24 hours. Also tried with other links, also not working today. It didn't work with the release before either.

JuanBindez commented 3 months ago

Well, the code works in the local environment, but it doesn't work in the cloud, such as Digital Ocean or Yandex Cloud

try to do like @felipeucelli, but if it doesn't work I advise you to try to investigate their infrastructure from the inside to try to understand what is causing it.

in4sec-org commented 3 months ago

See pytubefix.exceptions.LoginRequired: bXzTXD_OJo0 requires login to view after trying with pytubefix==6.3.4rc1. Now I’ll try the option with authorization, as advised above.

celarain commented 3 months ago

Tryed rc version and says f14EJhG3X68 requires login to view, but I check in incognito mode without any user and video works

in4sec-org commented 3 months ago

I solved the problem with: yt = YouTube(youtube_link, use_oauth=True, allow_oauth_cache=True).

In my case, I had a deployment via docker-compose with celery inside it and before starting I had to manually go into the required container and execute this code, then link my not very necessary gmail account (pretending to be a smart TV), only after that it worked.

The essence of linking is that the code will try to interrupt the input and wait for enter (in the case of the celery worker this is a dead number), you need to open the URL from the console or log in any browser (ok from another gadget and country) and enter the code from the log, and then Select the Google account to link, only then press enter. The code saves these credentials somewhere and this is not necessary on the second attempt.

Thank you all for your help, I’m ready to tell you my experience in more detail if anyone has similar problems.

potykion commented 3 months ago

@felipeucelli @in4sec-org You were right. The problem was resolved when using oauth credentials. Thanks!

celarain commented 3 months ago

Can someone show all the code with the auth too please?

potykion commented 3 months ago

@celarain just pass use_oauth=True option like so:

YouTube(
    "https://www.youtube.com/watch?v=K4TOrB7at0Y",
    use_oauth=True,
)

You will be asked to authorize with google after running the code

celarain commented 3 months ago

I did that, then appear a url to auth my device, I did it and then gives me the error:

Error downloading video: EOF when reading a line

potykion commented 3 months ago

Well, check if tokens.json file was created in pytubefix installation directory and it's contain actual tokens e.g. venv/Lib/site-packages/pytubefix/__cache__/tokens.json

celarain commented 3 months ago

Send bad error report:

Please open https://www.google.com/device and input code FPA-HRL-WTR Press enter when you have completed this step.2024-07-19 15:35:08,294 - app.video_downloader - ERROR - Error downloading video: EOF when reading a line

The problem is that I am connecting but if I try it again, it asks for the auth again image

Saw this folder but there is no tokens.json

/var/www/html/socialmedia/venv/lib/python3.12/site-packages/pytubefix/pycache

And there is no cache folder there

potykion commented 3 months ago

Maybe your directory is not writable, but it's definitely not the problem with the lib.

You can checkout the library code it should write tokens.json file to pytubefix/__cache__ dir: https://github.com/JuanBindez/pytubefix/blob/7e081b733074a7c030fdc18ea57ae5fb8c04ff17/pytubefix/innertube.py#L380C4-L394C1

JuanBindez commented 3 months ago

see the permissions of /var/www/ ll command and you may have to use chmod to change them

celarain commented 3 months ago

image

celarain commented 3 months ago

I am using pytubefix==6.3.4rc1 versión, which one should I use?

JuanBindez commented 3 months ago

this same one, from what I saw everyone has writing permission

celarain commented 3 months ago

the console log says after logging, then press continue, but I am reading this log with sudo journalctl -u socialmedia.service -f -n 30

And accepting permissions from external site, is that the problem? I have no idea how to do it in my external server everything

NannoSilver commented 3 months ago

YouTube may be blocking your remote IP, try using use_oauth and let us know the results.

Note: Don't use your main account to authenticate, YouTube may ban it.

If the current IP is banned by Youtube, then it is possible to try pytubefix with a proxy (to get a different IP).

in4sec-org commented 3 months ago

@celarain, try to call authorization in ipython, if you want to run your code in a container, then you need to do it once in the container. This is a one-time procedure, you just need a script or an interpreter that can wait for the enter input. He will ask you to go to the URL (you don’t have to do it from this IP) and enter the one-time code from the console there + select your account to link. After this, your normal code on this IP will work.

celarain commented 3 months ago

Thanks, will try it later!

SuryanshuTiwari commented 2 months ago

Hello All,

I got the same error, the code is working fine locally but when I deployed my code on heroku it game this error. I tried using use_oauth as well but it does not seems to be working. image

I am using pytubefix==6.8.1. I tried downgrading the version as well as upgrading to the latest one but the error is still not fixed.

Any help is truly appreciated.

iqbalcrat commented 2 months ago

is this issue fixed?

SubodaDabarera commented 2 months ago

Fixed the login required issue for an AWS EC2 deployed application.

The problem was that it asked for authentication in EC2 after deployment when use_oauth=True was set, as mentioned by @felipeucelli and @potykion. Locally, it worked perfectly.

My local environment:

OS : ubuntu 22.04
Python : 3.10
pytubefix : 6.6.2

The solution as below:

  1. Use the local environment to authenticate with PytubeFix’s required authentication (use the given URL to log into your Google account and press Enter on the terminal). Don't use your main google account
  2. Find the cached authentication credentials inside the PytubeFix library. Most likely, it will be located at venv/lib/python3.10/site-packages/pytubefix/__cache__/tokens.json
  3. Open your AWS console, go to EC2 instances, and connect to the EC2 instance via the AWS terminal
  4. Locate your project files and navigate to the venv/lib/python3.10/site-packages/pytubefix directory within your project.
  5. If it doesn’t exist, create a directory named "__cache" (`mkdir cache) inside the "pytubefix" directory and create a file called "tokens.json" (touch tokens.json`) inside the "__cache" directory
  6. Copy the contents of your local "tokens.json" file and paste them into the EC2 "tokens.json" file. Use nano tokens.json to open the file, paste the code, save it, and exit
  7. Restart the server via the AWS console or using supervisorctl

Now the issue is fixed, and videos are being extracted from YouTube without the “Login required” or “Private video” issues

But the problem is that it needs to update the tokens.json file when the OAuth token expires. Is there an alternative way of doing that?

Link - https://stackoverflow.com/questions/78959061/pytubefix-issue-handling-login-required-on-aws-ec2-deployed-applications

felipeucelli commented 2 months ago

@SubodaDabarera

But the problem is that it needs to update the tokens.json file when the OAuth token expires. Is there an alternative way of doing that?

YouTube is working tirelessly to block third-party apps. Update to the latest version of pytubefix and try:

  1. You can create a custom function that works best in your environment and pass it using oauth_verifier, see #190.

  2. You can also try to pass the PoToken which is valid for several days, see #209.

  3. You can also try using a proxy to change your IP.