coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.92k stars 638 forks source link

Cannot retrieve course contents #652

Open Sek-Cheung opened 3 years ago

Sek-Cheung commented 3 years ago

Operating System (name/version): Windows 10 Pro 64-bit Python version: 3.8 youtube-dl version: 2020.09.20 edx-dl version: 0.1.13

When I try to retrieve course videos using: edx-dl -u name@xxx.xx https://courses.edx.org/courses/course-v1:MITx+7.05x+3T2020/course/

the following error appears: edx_dl version 0.1.13 Password: Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64qbz5n2kfra8p0\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64qbz5n2kfra8p0\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\Sek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\Scripts\edx-dl.exe_main.py", line 7, in File "C:\Users\Sek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\edx_dl\edx_dl.py", line 1020, in main all_selections = {selected_course: File "C:\Users\Sek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\edx_dl\edx_dl.py", line 1021, in get_available_sections(selected_course.url.replace('info', 'course'), File "C:\Users\Sek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "C:\Users\Sek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\edx_dl\utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64qbz5n2kfra8p0\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64qbz5n2kfra8p0\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64qbz5n2kfra8p0\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64qbz5n2kfra8p0\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 502, in _call_chain result = func(args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

Steps to reproduce

Tell us how to reproduce this issue. Please provide us the course URL, and the specific subsection or unit if possible.

Expected behaviour

Tell us what should happen.

Actual behaviour

Tell us what happens instead. If the script fails, please copy the entire output of the command or the stacktrace (don't forget to obfuscate your username and password). If you cannot copy the exception, attach a screenshot.

alebaho commented 3 years ago

I am getting the same
raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden error on a different course.

johanneswerner commented 3 years ago

I have the same issue with a different course:

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
  File "/usr/bin/edx-dl", line 33, in <module>
    sys.exit(load_entry_point('edx-dl==0.1.13', 'console_scripts', 'edx-dl')())
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1020, in main
    all_selections = {selected_course:
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1021, in <dictcomp>
    get_available_sections(selected_course.url.replace('info', 'course'),
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
    page = get_page_contents(url, headers)
  File "/usr/lib/python3.8/site-packages/edx_dl/utils.py", line 58, in get_page_contents
    result = urlopen(Request(url, None, headers))
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
johanneswerner commented 3 years ago

The solution provided in https://github.com/coursera-dl/edx-dl/issues/631#issuecomment-667852988 worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

anantsinha commented 3 years ago

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

Did you change something else too? I and a few other people are getting empty folders when we do this.

johanneswerner commented 3 years ago

@anantsinha

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

Did you change something else too? I and a few other people are getting empty folders when we do this.

My apologies, I thought this is a solution, but it only worked for one course (no idea why), the others just produce empty folders.

anantsinha commented 3 years ago

@anantsinha

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

Did you change something else too? I and a few other people are getting empty folders when we do this.

My apologies, I thought this is a solution, but it only worked for one course (no idea why), the others just produce empty folders.

Ah okay. This is weird. I think this stopped working after edX changed the UI. Thanks though !