Puyodead1 / udemy-downloader

A Udemy downloader that can download courses, with DRM support.
MIT License
1.31k stars 307 forks source link

[BUG] yt-dlp throwing up 403 HTTPError exception #63

Closed vipul-ramachandran closed 2 years ago

vipul-ramachandran commented 2 years ago

Seems like yt-dlp is broken. I am getting the following error message with a status code of 403. I have updated yt-dlp to the latest version too, but the same issue persists. Not sure if it is from the yt-dlp end, or it's a bug in your code, so raising the issue here.

main.py --course-url <my-course-url> --bearer <my-auth-key> --download-captions --download-assets
Login Success
> Fetching course information, this may take a minute...
> Course information retrieved!
> Fetching course content, this may take a minute...
> Course content retrieved!
> Processing course data, this may take a minute.
Processing 1 of 71
Processing 2 of 71
Processing 4 of 71
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.
Error fetching MPD streams: 'ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.'
Processing 5 of 71
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.
Error fetching MPD streams: 'ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.'
Puyodead1 commented 2 years ago

You need to modify yt-dlp utils.py and comment out line 1680 'User-Agent': random_user_agent(),

mahnotz commented 1 year ago

i have this error I have updated yt-dlp to the latest version too, but the same issue persists.

utils.py not have this line: yt-dlp utils.py and comment out line 1680 'User-Agent': random_user_agent(),

what do i do?

Puyodead1 commented 1 year ago

User-Agent

A simple search of the file clearly shows that it just moved... image

silverbret1709 commented 1 year ago

I commented that line, but still get Unable to download webpage: HTTP Error 403

silverbret1709 commented 1 year ago

https://imgur.com/qRdxI8P @Puyodead1 Please help

mahnotz commented 1 year ago

i have again this error too

azec-pdx commented 1 year ago

I can confirm this error being a new moment with yt-dlp handling of Udemy MPD URLs. I have done some testing and it appears that issue is present now regardless of the yt-dlp version. I did testing with 2023.03.04(latest) and 2023.01.06 and in both instances I get this:

yt-dlp --cookies cookies-dlp.txt --use-extractors "udemy,udemy:course" https://<REDACTED>.udemy.com/assets/19952124/encrypted-files/out/v1/cb4d6ea3a5014771802df5aa507ba4b4/06c8dc12da2745f1b0b4e7c2c032dfef/842d4b8e2e014fbbb87c640ddc89d036/index.mpd\?token\=<REDACTED>\&provider\=cloudfront\&v\=1

Whether I pass --use-extractors or not it doesn't matter. Note that cookies are previously exported to cookies-dlp.txt using --cookies-from-browser chrome.

In addition to this, using both versions of yt-dlp mentioned above, commenting

'User-Agent': random_user_agent(),

line did not help. I was able to see that call yt-dlp --dump-user-agent returns different agent based on random pick of the Chrome version, but commenting this call might not be having effect to udemy and udemy:course extractors.

In addition to this, I tested pinning down User-Agent in yt-dlp/utils.py to match the real one of my browser (in case Udemy is somehow persisting & matching that) like this:

def random_user_agent():

    # _USER_AGENT_TPL = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/%s Safari/537.36'
    # _CHROME_VERSIONS = (
    #     '90.0.4430.212',
    #     '90.0.4430.24',
    #     '90.0.4430.70',
    #     '90.0.4430.72',
    #     '90.0.4430.85',
    #     '90.0.4430.93',
    #     '91.0.4472.101',
    #     '91.0.4472.106',
    #     '91.0.4472.114',
    #     '91.0.4472.124',
    #     '91.0.4472.164',
    #     '91.0.4472.19',
    #     '91.0.4472.77',
    #     '92.0.4515.107',
    #     '92.0.4515.115',
    #     '92.0.4515.131',
    #     '92.0.4515.159',
    #     '92.0.4515.43',
    #     '93.0.4556.0',
    #     '93.0.4577.15',
    #     '93.0.4577.63',
    #     '93.0.4577.82',
    #     '94.0.4606.41',
    #     '94.0.4606.54',
    #     '94.0.4606.61',
    #     '94.0.4606.71',
    #     '94.0.4606.81',
    #     '94.0.4606.85',
    #     '95.0.4638.17',
    #     '95.0.4638.50',
    #     '95.0.4638.54',
    #     '95.0.4638.69',
    #     '95.0.4638.74',
    #     '96.0.4664.18',
    #     '96.0.4664.45',
    #     '96.0.4664.55',
    #     '96.0.4664.93',
    #     '97.0.4692.20',
    # )
    # return _USER_AGENT_TPL % random.choice(_CHROME_VERSIONS)
    return 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'

but this didn't help either.

So after many hours spent on this, I am starting to believe that we will need to wait for official update on yt-dlp to deal with this. BTW my testing was done on Udemy For Business account. Everything worked until 2 days ago and stopped working in the middle of downloading one of the courses I need to watch offline while traveling.

As far as udemy-downloader is concerned, there is opportunity for improvement in this part:

def _extract_mpd(self, url):
        """extracts mpd streams"""
        _temp = []
        try:
            ytdl = yt_dlp.YoutubeDL({
                "quiet": True,
                "no_warnings": True,
                "allow_unplayable_formats": True
            })
            print(f"ytdl URL: {url}")
            results = ytdl.extract_info(url,
                                        download=False,
                                        force_generic_extractor=True)
...

From what I can tell, --force-generic-extractor is deprecated (see docs)

azec-pdx commented 1 year ago

Related bug on yt-dlp opened for a while: https://github.com/yt-dlp/yt-dlp/issues/1164#issuecomment-1550793295

mahnotz commented 1 year ago

@azec-pdx can you found solution?

mahnotz commented 1 year ago

@Puyodead1 Excuse me, I wanted to ask if you had any issues with Udemy subscription courses? If not, what country's IP address are you using?

Puyodead1 commented 1 year ago

@Puyodead1 Excuse me, I wanted to ask if you had any issues with Udemy subscription courses? If not, what country's IP address are you using?

I don't use a subscription plan so idk