coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.92k stars 638 forks source link

Can Extract 'List Courses', But, not able to download any course; urllib.error.HTTPError: HTTP Error 403: Forbidden #647

Open phaneshavr opened 3 years ago

phaneshavr commented 3 years ago

🚨Please review the Troubleshooting section before reporting any issue. Don't forget also to check the current issues to avoid duplicates.

Subject of the issue

I am able to download the list of courses successfully. But not able to download courses and get urllib.error.HTTPError: HTTP Error 403: Forbidden

Your environment

Steps to reproduce

Tell us how to reproduce this issue. Please provide us the course URL, and the specific subsection or unit if possible. https://courses.edx.org/courses/course-v1:edX+edx201+1T2020/course/ above url is one example. But, with any other enrolled course also, the problem is same.

Expected behaviour

Tell us what should happen. It should automatically download the course.

Actual behaviour

Tell us what happens instead. If the script fails, please copy the entire output of the command or the stacktrace (don't forget to obfuscate your username and password). If you cannot copy the exception, attach a screenshot.

edx_dl version 0.1.13 Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\Scripts\edx-dl.exe__main__.py", line 7, in File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1020, in main all_selections = {selected_course: File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1021, in get_available_sections(selected_course.url.replace('info', 'course'), File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 531, in open response = meth(req, response) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 502, in _call_chain result = func(args) File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

phaneshavr commented 3 years ago

I already tried changing the 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', or to 'User-Agent': 'Chrome/85.0.4183.102' as suggested in #636 #637 # but still it did not help and keep getting the same error. with the argument '--list-courses', I am able to successfully download list of my enrolled courses. But, I am not able to download any course and get the above error

let4be commented 3 years ago

Same story... the default installation of edx-dl is no longer working

JWChengRelax commented 3 years ago

I already tried changing the 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', or to 'User-Agent': 'Chrome/85.0.4183.102' as suggested in #636 #637 # but still it did not help and keep getting the same error. with the argument '--list-courses', I am able to successfully download list of my enrolled courses. But, I am not able to download any course and get the above error

I meet the same error.

staticfloat commented 3 years ago

I can confirm I'm having the same error.

liam-maps commented 3 years ago

Same here. Listing works ok, but downloading not: edx_dl version 0.1.13 Password: Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "/home/ubuntu/.local/bin/edx-dl", line 8, in sys.exit(main()) File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1020, in main all_selections = {selected_course: File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1021, in get_available_sections(selected_course.url.replace('info', 'course'), File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.8/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response response = self.parent.error( File "/usr/lib/python3.8/urllib/request.py", line 569, in error return self._call_chain(args) File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(args) File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

anantsinha commented 3 years ago

Same here. Course Link: https://courses.edx.org/courses/course-v1:MITx+CTL.SC0x+2T2020/course/

Actual behaviour:

Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "/Users/Anant/opt/anaconda3/bin/edx-dl", line 8, in sys.exit(main()) File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 1023, in main for selected_course in selected_courses} File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 1023, in for selected_course in selected_courses} File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 569, in error return self._call_chain(args) File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(args) File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

VinuRajaKumar commented 3 years ago

Same error

edx_dl version 0.1.13 Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\Vinu Raja Kumar C\AppData\Local\Programs\Python\Python36\Scripts\edx-dl.exe__main__.py", line 7, in File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\edx_dl.py", line 1023, in main for selected_course in selected_courses} File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\edx_dl.py", line 1023, in for selected_course in selected_courses} File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 223, in urlopen return opener.open(url, data, timeout) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 532, in open response = meth(req, response) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 570, in error return self._call_chain(args) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 504, in _call_chain result = func(args) File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

ChechkovEugene commented 3 years ago

Same for me

edx_dl version 0.1.13 Password: Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "/usr/local/bin/edx-dl", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1020, in main all_selections = {selected_course: File "/usr/local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1021, in get_available_sections(selected_course.url.replace('info', 'course'), File "/usr/local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "/usr/local/lib/python3.8/site-packages/edx_dl/utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 640, in http_response response = self.parent.error( File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 569, in error return self._call_chain(args) File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(args) File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

hasnain3142 commented 3 years ago

Same error

Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
  File "/home/beinghasnain16/.local/bin/edx-dl", line 8, in <module>
    sys.exit(main())
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/edx_dl.py", line 1023, in main
    for selected_course in selected_courses}
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/edx_dl.py", line 1023, in <dictcomp>
    for selected_course in selected_courses}
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
    page = get_page_contents(url, headers)
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/utils.py", line 58, in get_page_contents
    result = urlopen(Request(url, None, headers))
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
jmfontana commented 3 years ago

Confirmed on 29-10-2020. I'm having the same problem even after trying solutions suggested in #636 #637. Can someone help?

johanneswerner commented 3 years ago

EDIT: I have the same problem, see #652

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
  File "/usr/bin/edx-dl", line 33, in <module>
    sys.exit(load_entry_point('edx-dl==0.1.13', 'console_scripts', 'edx-dl')())
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1020, in main
    all_selections = {selected_course:
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1021, in <dictcomp>
    get_available_sections(selected_course.url.replace('info', 'course'),
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
    page = get_page_contents(url, headers)
  File "/usr/lib/python3.8/site-packages/edx_dl/utils.py", line 58, in get_page_contents
    result = urlopen(Request(url, None, headers))
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
jmfontana commented 3 years ago

see #652

Sorry Johannes but I don't see how this helps. I've checked #652 and I can't see any information there that can help us solve this problem.

johanneswerner commented 3 years ago

@jmfontana My apologies, I wanted to report the same issue under a different operation system (arch linux, installed with the aur package v. 0.1.13, but I posted it in the wrong issue first (#652). I edited my previous post to make it clearer.

johanneswerner commented 3 years ago

The solution provided in https://github.com/coursera-dl/edx-dl/issues/631#issuecomment-667852988 worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

ChechkovEugene commented 3 years ago

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

Yes. This fixed 403 error. But now i have only empty folders in the downloaded course. But maybe this is another issue not linked with 403

johanneswerner commented 3 years ago

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

Yes. This fixed 403 error. But now i have only empty folders in the downloaded course. But maybe this is another issue not linked with 403

Same situation here, it worked for one course (no idea why), but I try with others, I get the same problem.

MagTun commented 3 years ago

@ChechkovEugene and @johanneswerner, did you try this?

gledguri commented 3 years ago

Same problem here even though I tried #636 #637 Is there any solution/s yet?

ChechkovEugene commented 3 years ago

@ChechkovEugene and @johanneswerner, did you try this?

It's working. But in some moment https://github.com/l1ving/youtube-dl/issues/20. error appears. Waiting for finishing all merges

Learnpython-code commented 3 years ago

Hello everyone, I am new with python, Please help checking my results, I dont got any videos , only folders empty.

Result

C:\edx-dl-master>python edx-dl.py -u (username) https://courses.edx.org/courses/coursev1:URosarioX+URX01+1T2020/course/ edx_dl version 0.1.13 Password: Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Downloading Diseño de sistemas de información gerencial para intranet con Micros oft Access [course-v1:URosarioX+URX01+1T2020/co] Downloading 5 section(s) Section 1: Generalidades Acerca del curso Section 2: Microsoft Access y Bases de Datos Relacionales Conceptos básicos Planear y crear una BDR Evaluación Section 3: Diseño de la interface - Consultas Visualizar información Modificar la BDR con consultas de acción Interacción con otros programas Evaluación Section 4: Diseño de la interface - Formularios y macros Ingresar datos a la BDR Panel de control personalizado Evaluación Section 5: Diseño de la interface - Informes Informes Evaluación Cierre Extracting all units information in parallel. Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@ddbbb4394e4f4eeab5716 95c19842fc2' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@edcc3663b92546ee9f37d 4868d05ba30' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@7a917180012346c8b7f1d e5837729bbd' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@fdb672aa18b0485aa6954 19f493a5fd0' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@5b34eb36e50a4db6a9c4c 53e719546cf' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@c78e301110b54cff8a850 0c784e16d09' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@fcd257068abb4f588805d b3a15e0ba06' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@9205182f4d2b46ec93fd6 ff22d752fa6' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@f9a2c97a613a40169a016 67bb6aca2be' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@30549607116847379bc57 b4419084652' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@09f8ee9e3295491495749 4d87da8a4bc' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@674fda5e810440f190d84 9740e674cae' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@fe847e5e361b47a3a3efd 82f480b2a4e' Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@29c2dfb8e8294eed941ee 3b576db59c8' Removed 0 duplicated urls from 0 in total Output directory: Downloaded

jialinyi94 commented 3 years ago

The same issues here.

Any progress?

ndcroos commented 3 years ago

I also have the same problem here, using the default install from pip.