coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 639 forks source link

HTTP Error 500: Internal Server Error #630

Open sirdeniel opened 4 years ago

sirdeniel commented 4 years ago

Subject of the issue

Whenever I want to use --list-courses or just download some course I get HTTP Error 500: Internal Server Error

Your environment

Steps to reproduce

  1. Signup or login to edx account and enroll to this free course Critical Thinking & Problem-Solving
  2. Install edx-dl like readme.md on this project
  3. Type in command line
    edx-dl -u <email> -p <password> "https://courses.edx.org/courses/course-v1:RITx+LEAD103+2T2020/course/" -s -o <outputpath> -i --debug

Expected behaviour

Download course with subtitles, ignoring errors if one of the video download fails

Actual behaviour

root[main] edx_dl version 0.1.13
root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py']
root[edx_get_headers] Building initial headers for future requests.
root[_get_initial_token] Getting initial CSRF token.
root[_get_initial_token] Found CSRF token.
root[edx_get_headers] Headers built: {'User-Agent': 'edX-downloader/0.01', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8', 'Referer': 'https://courses.edx.org/user_api/v1/account/login_session', 'X-Requested-With': 'XMLHttpRequest', 'X-CSRFToken': '<some token here>'}
root[edx_login] Logging into Open edX site: https://courses.edx.org/login_ajax
root[get_courses_info] Extracting course information from dashboard.
Traceback (most recent call last):
  File "c:\users\suit\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\SUIT\AppData\Local\Programs\Python\Python37\Scripts\edx-dl.exe\__main__.py", line 7, in <module>
  File "c:\users\suit\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 1015, in main
    courses = get_courses_info(DASHBOARD, headers)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 144, in get_courses_info
    page = get_page_contents(url, headers)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents
    result = urlopen(Request(url, None, headers))
  File "c:\users\suit\appdata\local\programs\python\python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "c:\users\suit\appdata\local\programs\python\python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error

Some thoughts

I think it's related to headers parameter passed to request.py. Found this answer about HTTP 500 error on request.py and appears to be related to the use of correct and minimal headers.