coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 641 forks source link

404 although I can open the URL in my browser #585

Open ulfklose opened 4 years ago

ulfklose commented 4 years ago

Subject of the issue

When using one of the URLs I get back by using --ignore-errors edx-dl throws a 404.

Your environment

Steps to reproduce

Just pass https://courses.edx.org/courses/course-v1:IsraelX+infosec102+3T2019/course/ as the URL argument.

Expected behaviour

The course should be downloaded.

Actual behaviour

edx-dl -u my-email@gmail.com -p "my-secret-password" https://courses.edx.org/courses/course-v1:IsraelX+infosec102+3T2019/course/ edx_dl version 0.1.11 Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Traceback (most recent call last): File "/usr/local/bin/edx-dl", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 1018, in main for selected_course in selected_courses} File "/usr/local/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 1018, in for selected_course in selected_courses} File "/usr/local/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections page = get_page_contents(url, headers) File "/usr/local/lib/python3.7/site-packages/edx_dl/utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error return self._call_chain(args) File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(args) File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

RaviV1 commented 4 years ago

Looks like with --debug - we get displayed in notepad++ looks like

root[get_courses_info] Data extracted: [Unlocking Information Security: Part [][]: https://courses.edx.org/courses/course-v1:IsraelX+infosec102+3T2019/course/, Unlocking Information Security: Part [][]: https://courses.edx.org/courses/course-v1:IsraelX+infosec102+1T2020/course/, Unlocking Information Security: Part []: https://courses.edx.org/courses/course-v1:IsraelX+infosec101+3T2019a/course/ ]

whereas the details of the string which should be : root[get_courses_info] Data extracted: [Unlocking Information Security: Part ⅠⅠ: https://courses.edx.org/courses/course-v1:IsraelX+infosec102+3T2019/course/, Unlocking Information Security: Part ⅠⅠ: https://courses.edx.org/courses/course-v1:IsraelX+infosec102+1T2020/course/, Unlocking Information Security: Part : https://courses.edx.org/courses/course-v1:IsraelX+infosec101+3T2019a/course/ ]

Hope this helps

MATRIX30 commented 4 years ago

having the same issue with this course, don't Know if someone can help with a fix, i'll be grateful

floviolleau commented 4 years ago

Hi,

Maybe because of an url redirection?

If you go to the course table of content, what is the url? /course/XXXXXX or /course?

Kind regards

ulfklose commented 4 years ago

The course I was referring to has been closed in the mean time so I can't try it again.