coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.92k stars 638 forks source link

HTTP 403 forbidden #631

Open munipr opened 3 years ago

munipr commented 3 years ago

I am getting the following error for past 3 days. I have the latest edx-dl and youtube-dl installed in an environment with python 3.7

edx_dl version 0.1.13 Password: Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Scripts\edx-dl.exe__main__.py", line 9, in File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\site-packages\edx_dl\edx_dl.py", line 1023, in main for selected_course in selected_courses} File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\site-packages\edx_dl\edx_dl.py", line 1023, in for selected_course in selected_courses} File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\urllib\request.py", line 531, in open response = meth(req, response) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\urllib\request.py", line 503, in _call_chain result = func(args) File "c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

laurion commented 3 years ago

+1

dorianherle commented 3 years ago

Same issue here

THolding commented 3 years ago

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

constantiux commented 3 years ago

Cool! It works @THolding

floviolleau commented 3 years ago

Hi,

It is the same issue as #628.

Should we do a PR with that fix?

Kind regards

Zibetti commented 3 years ago

Hi, the following worked to me:

1) Change in line 425 of edx_dl.py User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Chrome/51.0.2704.103' 2) Then follow theses step at: #595 (link)

Also, I had open the course page using the web broswer (Chrome). I dont know if these steps have any inlfuence in the process.

chss commented 3 years ago

I have tried all the changes recommended on 425 in edx_dll.py and Parser.Py. Still no luck.

DGEs2018 commented 3 years ago

Thank you @THolding, tried your solution and it partially worked for me as well. But then at last, after downloading two modules it broke with the message 'returned non-zero exit status 1.' Any helpful hints or fixes ? Not that I'm an expert myself but @munipr , @laurion, @chss, @floviolleau & @totyped, try and put in the the name & version of the browser (in my case 'User-Agent':'Chrome/84.0.4147.105') you've your courses opened with and it should be fixed ?

drdata2018 commented 3 years ago

+1 File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\Scripts\edx-dl.exe__main__.py", line 7, in File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1020, in main all_selections = {selected_course: File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1021, in get_available_sections(selected_course.url.replace('info', 'course'), File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 531, in open response = meth(req, response) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 502, in _call_chain result = func(args) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

drdata2018 commented 3 years ago

Thanks it works

prayasbat commented 3 years ago

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

hi i am trying to run edx_dl.py to do as you have mentioned but a i run edx_dl.py by command prompt it say, this Traceback (most recent call last): File "edx_dl.py", line 33, in from ._version import version ImportError: attempted relative import with no known parent package

DGEs2018 commented 3 years ago

Just looked up the reference and come up with this. The error might have to do with the version of python you're using. Inside the README.md here reads > We strongly recommend that, if you don't already have a Python interpreter installed, that you install Python >= 3.6, if possible, since it is better in general.

sasidhar22 commented 3 years ago

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

THANK you so much. It's working fine now😃

drdata2018 commented 3 years ago

Any hope for quiz or assignments?

nazarialireza commented 3 years ago

Same issue here! Change 'User-Agent': 'edX-downloader/0.01' not working. Python 3.8.5

Extracting course information from dashboard. Traceback (most recent call last): File "c:\python38-32\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\python38-32\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Python38-32\Scripts\edx-dl.exe__main__.py", line 7, in File "c:\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1020, in main all_selections = {selected_course: File "c:\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1021, in get_available_sections(selected_course.url.replace('info', 'course'), File "c:\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "c:\python38-32\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "c:\python38-32\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "c:\python38-32\lib\urllib\request.py", line 531, in open response = meth(req, response) File "c:\python38-32\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "c:\python38-32\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "c:\python38-32\lib\urllib\request.py", line 502, in _call_chain result = func(args) File "c:\python38-32\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

dborwankar commented 3 years ago

Where can I find this edx_dl.py file in Linux?

DGEs2018 commented 3 years ago

@YediPublic - I don't use Linux, so I'm not familiar but it should be under /usr/lib/python(installed version)? You might want to see if this link helps. You might have to update to the latest version of python pip pip install edx-dl

prabhakar9885 commented 3 years ago

@YediPublic - When you got the failure message, you must have see a few lines like this on your terminal

Traceback (most recent call last): File "/usr/local/bin/edx-dl", line 10, in sys.exit(main()) File "_/usr/local/lib/python3.7/site-packages/edx_dl/edxdl.py", line 1023, in main

In the above case, the path to the file that you want to edit is _/usr/local/lib/python3.7/site-packages/edx_dl/edxdl.py

Luciano-Delaude commented 3 years ago

Is there any solution to this? I tried to download https://courses.edx.org/courses/course-v1:WellesleyX+Italian1x+1T2019/course/ but always get empty folders

DGEs2018 commented 3 years ago

@Luciano-Delaude I just come across this link where @bi1yeu 's solution seems to have fixed the same issue for a couple of others. Had this issue myself but gonna have to give this a shot later yet

Luciano-Delaude commented 3 years ago

@Luciano-Delaude I just come across this link where @bi1yeu 's solution seems to have fixed the same issue for a couple of others. Had this issue myself but gonna have to give this a shot later yet

I tried to use that solution but it didn't worked either, I just get an empty folder with that too. If you can fix it, please let me know

anantsinha commented 3 years ago

@Luciano-Delaude I just come across this link where @bi1yeu 's solution seems to have fixed the same issue for a couple of others. Had this issue myself but gonna have to give this a shot later yet

I tried to use that solution but it didn't worked either, I just get an empty folder with that too. If you can fix it, please let me know

Same here

jmfontana commented 3 years ago

Exactly the same here. I've tried all solutions suggested and still no dice.

AbyssInTheMonad commented 3 years ago

Is there any solution to this? I tried to download https://courses.edx.org/courses/course-v1:WellesleyX+Italian1x+1T2019/course/ but always get empty folders

yes, I have the same problem fater using this solution

marianfi commented 3 years ago

Same issue - tried all the solutions suggested here and no luck

JimmyNgUNITEN commented 3 years ago

Same issue - tried all the solutions suggested here and no luck

Same here, I had tried change User-Agent': 'Chrome/51.0.2704.103' (Since i usinh chrome to open edx) and return section_soup.ol#return section_soup.a['href'] But no one work for me. Appreciate it if anyone could help me

skiextreme commented 3 years ago

Hi everyone,

Great information and thank you to the rock starts that contributed. Quick question if anyone knows please? I'm enrolled in an edX course, but it's the free version (auditing). I've got the edx-dl and python setup and after running it from the command line, it stated no downloadable content found.

Am I correct in thinking this is setup and running correctly and because I'm auditing the course (for free), that I won't be able to save any of the content?

berezovskyi commented 3 years ago

No, free auditable courses should be downloadable too (if you can play the videos in your browser). Most likely edx changed their layout that broke the downloader.

skiextreme commented 3 years ago

@berezovskyi

Here's the output I got:

edx_dl version 0.1.13 Password: Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Downloading Penetration Testing - Exploitation [course-v1:NYUx+CYB.PEN.2+1T2021/co] Downloading 0 section(s) Extracting all units information in parallel. No downloadable video found.

berezovskyi commented 3 years ago

I think I got the same error. I tried applying a few patches suggested on this thread to my local fork and gave up for the time being. Download from the website with videodownloadhelper generally works fine.

waqaskeen commented 3 years ago

I also got the "No downloadable video found." error. In the mean time videodownloadhelper is working (but have to download each video individually)

josefco2510 commented 3 years ago

Hi, I change the line, but another error:

Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Traceback (most recent call last): File "C:\edx-dl-master\edx-dl.py", line 8, in edx_dl.main() File "C:\edx-dl-master\edx_dl\edx_dl.py", line 1014, in main resp = edx_login(LOGIN_API, headers, args.username, args.password) File "C:\edx-dl-master\edx_dl\edx_dl.py", line 225, in edx_login response = urlopen(request) File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen return opener.open(url, data, timeout) File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open response = self._open(req, data) File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain result = func(*args) File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1389, in https_open return self.do_open(http.client.HTTPSConnection, req, File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1346, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1253, in request self._send_request(method, url, body, headers, encode_chunked) File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1294, in _send_request self.putheader(hdr, value) File "C:\Users\jgutierrez\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1218, in putheader header = header.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\xed' in position 115: ordinal not in range(128)

can you help me?