Closed MATRIX30 closed 4 years ago
Having the same issue :(
Confirmed with different url: https://courses.edx.org/courses/course-v1:MITx+14.750x+3T2019/course/
Output of --debug:
root[main] edx_dl version 0.1.10 root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py'] root[edx_get_headers] Building initial headers for future requests. root[_get_initial_token] Getting initial CSRF token. root[_get_initial_token] Found CSRF token. root[edx_get_headers] Headers built: {'User-Agent': 'edX-downloader/0.01', 'Accept': 'application/json, text/javascript, /; q=0.01', 'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8', 'Referer': 'https://courses.edx.org/login_ajax', 'X-Requested-With': 'XMLHttpRequest', 'X-CSRFToken': 'PUsSLjqYvxBtMFO07I7RfYRpxPPZdHE0zWBVoJk4aqqo8AOSciOeEoSTr49FvNeH'} root[edx_login] Logging into Open edX site: https://courses.edx.org/login_ajax root[get_courses_info] Extracting course information from dashboard. root[get_courses_info] Data extracted: ["lotsofcourseswhichidontwanttoshare"] root[get_available_sections] Extracting sections for :https://courses.edx.org/courses/course-v1:MITx+14.750x+3T2019/course/ root[get_available_sections] Extracted sections: [] root[_display_selections] Downloading Political Economy and Economic Development [course-v1:MITx+14.750x+3T2019/co] root[_display_sections] Downloading 0 section(s) root[extract_all_units_in_sequence] Extracting all units information in sequentially. root[extract_all_units_in_sequence] urls: [] root[parse_units] No downloadable video found.
Same issue with multiple courses.
edx_dl version 0.1.10 Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Downloading Data Science: Machine Learning [course-v1:HarvardX+PH125.8x+2T2019/co] Downloading 0 section(s) Extracting all units information in parallel. No downloadable video found.
Same issue. Course: https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/
It looks like edx-dl
is missing most of the sections of the course. In my example, it sees only 1 section, while edx site displays more than 5 (at the moment):
> edx-dl.py -u <username> --list-sections https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/
edx_dl version 0.1.10
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Fundamentals of Statistics [course-v1:MITx+18.6501x+3T2019/co] has 1 sections so far
1 - Download Entrance Survey videos
Here's mine...
Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Downloading Calculus Applied! [course-v1:HarvardX+CalcAPL1x+2T2019/co] Downloading 3 section(s) Section 1: Optional Sections (CHOOSE 1 of 3) Optional Sections Section 2: Section 12: Course Wrap Up End of Course Survey Course Feedback Forum (Optional) Section 3: Acknowledgements Course Team and Special Thanks Section 1: What Makes a Good Test Question? Mathematical Models to Measure Knowledge and Improve Learning Section 2: Economic Applications of Calculus: Elasticity and A Tale of Two Cities Section 3: From X-rays to CT scans: Mathematics and Medical Imaging Section 4: What is Middle Income? Thinking about Income Distributions with Statistics and Calculus Section 5: Population Dynamics Part I: the Evolution of Population Models and Section 6: Population Dynamics II: A Biological Puzzle OR How Fishing Affects a Predator-Prey System Section 7: Extinction, Chaos and other Bifurcation Behavior, Section 8: Bifurcation Part II: Outbreak! Budworm Populations and Bifurcations, Section 9: Bifurcation Part III: Species in Competition: Coexistence or Exclusion Section 10: E = mc²: Taylor Approximation and the Energy Equation Final Assessments Extracting all units information in parallel. Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@944fb6867b354e2cafb41415aae41415' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@2101c542ac614691acc54224d3c314a8' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@5864500159ef40f9839d66d2492fea58' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@13aed97186fd4c7588a5ea1399e096df' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@a53371a01e9c4fd28dcb1a1609614da7' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@ebf2c858d37e418583f839965631108f' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@d4e29c075ff14ad583a3750767faf698' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@6cc97f049d444c4f8470b88ad3fdbc52' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@0f7edf523c55490e8380b6e9a809df33' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@fb7c4d1c1a2649b29e472b2ef86a36ce' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@edb436fadf2c4b74b175b9b5b6334b48' Processing 'https://courses.edx.org/courses/course-v1:HarvardX+CalcAPL1x+2T2019/jump_to/block-v1:HarvardX+CalcAPL1x+2T2019+type@vertical+block@1ccb65aca6b34beda14dedfa6bffafbc' Removed 0 duplicated urls from 0 in total Output directory: Downloaded
Same issue with multiple courses.
Same issue here, edx-dl only sees the first section.
Heres the log:
root[main] edx_dl version 0.1.10 root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py'] Password: root[edx_get_headers] Building initial headers for future requests. root[_get_initial_token] Getting initial CSRF token. root[_get_initial_token] Found CSRF token. root[edx_get_headers] Headers built: {'User-Agent': 'edX-downloader/0.01', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8', 'Referer': 'https://courses.edx.org/login_ajax', 'X-Requested-With': 'XMLHttpRequest', 'X-CSRFToken': 'wWr0eKCgnA1uusK8rQvzPJHFK8bXmxn4i1pxyGtnuxsy0MRE8LXYh87mk8DN1eST'} root[edx_login] Logging into Open edX site: https://courses.edx.org/login_ajax root[get_courses_info] Extracting course information from dashboard. root[get_courses_info] Data extracted: [Fundamentals of Statistics: https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/, TOEFL Test Preparation: The Insider’s Guide: https://courses.edx.org/courses/course-v1:ETSx+TOEFLx+3T2017/course/, Minds and Machines: https://courses.edx.org/courses/course-v1:MITx+24.09x+3T2015/course/, Practical Learning Analytics: https://courses.edx.org/courses/course-v1:MichiganX+PLAx+2T2016/course/, Embedded Systems - Shape the World: https://courses.edx.org/courses/course-v1:UTAustinX+UT.6.03x+1T2016/course/, The Science of Everyday Thinking: https://courses.edx.org/courses/course-v1:UQx+Think101x+2T2015/course/, Electronic Interfaces: https://courses.edx.org/courses/course-v1:BerkeleyX+EE40LX+2T2015/course/, Autonomous Navigation for Flying Robots: https://courses.edx.org/courses/TUMx/AUTONAVx/2T2014/course/, Next Generation Infrastructures - Part 2: https://courses.edx.org/courses/DelftX/NGI102x/3T2014/course/, Solar Energy: https://courses.edx.org/courses/DelftX/ET.3034TU/3T2014/course/, Circuits and Electronics: https://courses.edx.org/courses/MITx/6.002_4x/3T2014/course/] root[get_available_sections] Extracting sections for :https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/ root[get_available_sections] Extracted sections: [<edx_dl.common.Section object at 0x1042f6110>] root[_display_selections] Downloading Fundamentals of Statistics [course-v1:MITx+18.6501x+3T2019/co] root[_display_sections] Downloading 1 section(s) root[_display_sections] Section 1: Entrance Survey root[_display_sections] 1. Entrance Survey root[extract_all_units_in_parallel] Extracting all units information in parallel. root[extract_all_units_in_parallel] urls: ['https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/jump_to/block-v1:MITx+18.6501x+3T2019+type@vertical+block@entrancesurvey-tab1'] root[extract_units] Processing 'https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/jump_to/block-v1:MITx+18.6501x+3T2019+type@vertical+block@entrancesurvey-tab1' root[main] Removed 0 duplicated urls from 0 in total root[download] Output directory: Downloaded
Looks like edx-dl is missing most of the sections of the course. My case https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/course/.
Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Downloading FA18: Deterministic Optimization [course-v1:GTx+ISYE6669+2T2018/co] Downloading 5 section(s) Section 1: Getting Started Welcome Message Syllabus Getting Help Getting to Know Each Other Section 2: Discussions and Q&A Discussions and Q&A Forums Section 3: Proctoring Information - Verified Learners Section 4: Midterm Exam - Verified Learners Section 5: Final Exam - Verified Learners Extracting all units information in parallel. Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@b4e0e428596e4a438b61d9c44a66ff45' Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@6e0eef9f7a9b4eed99ea9c1ad8e37b16' Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@d827bed0374e46b5a0abe62978b7cca8' Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@3247cb48d14b4f1e97bb9dd74d1ec8a2' Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@c49832c367cc47be96ba15a3ce5e9d8c' Removed 0 duplicated urls from 0 in total Output directory: Downloaded
I have the same issue:
edx_dl version 0.1.10
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Introduction to Discrete Choice Models [course-v1:EPFLx+DiscreteChoiceX+3T2017/co]
Downloading 0 section(s)
Extracting all units information in parallel.
No downloadable video found.
So, I've dug into the code a bit and I think I found the issue: for some courses, edx has again updated the structure of their website. The issue is with line 397 in /edx-dl/.parsing.py
sections_soup = soup.find_all('li', class_='outline-item section')
In the new format, the sections have a different class, namely "outline-item section scored".
Should be easily fixed. will try to hack sth together, but this better be checked by so experienced.
Alright, quick fix:
replace as follows in /edx_dl/parsing.py:
Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable')
with subsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])
and line 397:
sections_soup = soup.find_all('li', class_='outline-item section')
with sections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])
This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.
Thanks a lot. Its working now.
thank you it works now
Alright, quick fix:
replace as follows in /edx_dl/parsing.py:
Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable')
withsubsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])
and line 397:
sections_soup = soup.find_all('li', class_='outline-item section')
withsections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])
This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.
this partially works , it still misses some weeks and module i tried it on this course
https://courses.edx.org/courses/course-v1:CurtinX+MKT1x+1T2019/course/
and the entire module 3 didnt download
@malawadd can you please share error messages/debug info? Do the sections just not download or does it exit with a message?
@mor3dr3ad
it download an empty folder but skips all the content, then processed to downloading the following module and all it's content, there are no error messages or anything
Just ran the course you mentioned and it seems to be working for me. Will do some more testing this week. In the meanwhile maybe download missing vids manually
@mor3dr3ad do you mind telling me more about the testing you plan to run , because i would like to try and fix this but am not sure where to start nor what exactly i should look for.
@malawadd well for starters you could help by providing some more debugging info by using the --debug flag when running edx with the course you mentioned and providing information.
For me, my fix is working, even with your course. So without being able to reproduce your error I can only assume there is a different issue (maybe using a different version of edx-dl?)
If something fixes a program, why don't you submit your changes as a pull request to fix things (or get things slightly improved) for other users?
Planning on doing exactly that sometime this week. Just a bit busy right now
-------- Original Message -------- From: "Rogério Brito" notifications@github.com Sent: 5 November 2019 15:52:11 CET To: coursera-dl/edx-dl edx-dl@noreply.github.com Cc: mor3dr3ad christof.weigelmeier@posteo.net, Mention mention@noreply.github.com Subject: Re: [coursera-dl/edx-dl] edx-dl not able to download videos from edx platform (#559)
If something fixes a program, why don't you submit your changes as a pull request to fix things (or get things slightly improved) for other users?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/coursera-dl/edx-dl/issues/559#issuecomment-549840707
Thanks, please do and I can do a round of code review and merge everything. That will be awesome!
Hello,
Alright, quick fix:
replace as follows in /edx_dl/parsing.py:
Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable')
withsubsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])
and line 397:
sections_soup = soup.find_all('li', class_='outline-item section')
withsections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])
This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.
This solution works for many courses, but now old courses are not supported: https://courses.edx.org/courses/course-v1:KTHx+DTS02.1x+1T2018/course/
For class https://courses.edx.org/courses/course-v1:MITx+2.830.2x+3T2019/course/ it worked partially. Not all videos and attachments were downloaded.
By the way, thank you to everyone who is working on this. This tool is so helpful as a time saver to allow working on classes offline.
Alright, quick fix:
replace as follows in /edx_dl/parsing.py:
Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable')
withsubsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])
and line 397:
sections_soup = soup.find_all('li', class_='outline-item section')
withsections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])
This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.
This should be integrated into a new release. Edx has changed their website structure and this new change breaks all download operations with edx-dl.
Thanks everyone! I'm facing the same issue and unfortunately the solution provided does not work with this course: https://courses.edx.org/courses/course-v1:EdinburghX+CCSx+3T2019/course/ any hint?
Hi.
I've put together few pull requests that fix various issues with currently released (0.1.10) edx-dl
@rbrito could you or other core contributors, please, review these PRs and release new version with these fixes included? The currently released version has been unusable for some time now and it would be great to release fixes for these breaking issues whenever possible.
Meanwhile, If someone needs a working version or is willing to test these fixes you can access cumulative fix with all of the above included here: https://github.com/EugeneLoy/edx-dl/tree/cummulative
@malawadd I've checked the course you are having problem with and it looks like some of the videos are no longer available:
[download] https://www.youtube.com/watch?v=N9SFeRNAfEA => Downloaded\Digital_Branding_and_Engagement\02-Module_1-_The_Digital_Consumer\02-%(title)s-%(id)s.%(ext)s
Downloading video with URL https://www.youtube.com/watch?v=N9SFeRNAfEA from YouTube.
[youtube] N9SFeRNAfEA: Downloading webpage
[youtube] N9SFeRNAfEA: Downloading video info webpage
WARNING: Unable to extract video title
WARNING: unable to extract description; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
ERROR: This video is no longer available because the YouTube account associated with this video has been terminated.
Sorry about that.
It is likely that your specific problem was caused by deletion of the video from youtube itself, not bug in edx-dl
Hi @EugeneLoy , thank you for your help! May I ask if you were able to download this course? https://courses.edx.org/courses/course-v1:EdinburghX+CCSx+3T2019/course/ I'm having trouble with it but not with others
@antoniosereno yes, I've been able to download that course.
Ok I've downloaded the edx-dl-cummulative, made everything you suggested and now it gives me an HTTP Error 400: Bad Request
Yesterday I was able to access the courses list, now I'm not able anymore..
It there anything I'm missing?
@antoniosereno are you sure you running code from cummulative
branch of the repo and not the one installed globally in your system?
The error you are getting looks like the one that should be fixed by #569 .
One way to run code from repo is to cd
into repo root and point python
to .py
file directly, like this:
python edx-dl.py -u <user> <course_url>
If this wont help, please, post the full debug output, so I could figure out what went wrong.
Hi @EugeneLoy,
Doesn't work on my end as well.
From your fork root dir:
In:
python edx-dl.py -u <name>@gmail.com https://courses.edx.org/courses/course-v1:DavidsonX+D001x+3T2018/course/
Out:
rses.edx.org/courses/course-v1:DavidsonX+D001x+3T2018/course/ --debug
root[main] edx_dl version 0.1.10
root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py']
Password:
root[edx_get_headers] Building initial headers for future requests.
root[_get_initial_token] Getting initial CSRF token.
Traceback (most recent call last):
File "edx-dl.py", line 6, in <module>
edx_dl.main()
File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 1000, in main
headers = edx_get_headers()
File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 425, in edx_get_headers
'X-CSRFToken': _get_initial_token(EDX_HOMEPAGE),
File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 167, in _get_initial_token
opener.open(url)
File "/opt/conda/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/opt/conda/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/opt/conda/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/opt/conda/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/opt/conda/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
@naefl @antoniosereno I think I know what the problem is. However, I'll need a bit more cooperation from you to make sure, since I cannot reproduce this in my environment.
I've added commit with test fix and some debug output to cummulative
branch. Grab it and, please, let me know if this works for you now.
If this won't fix this issue, please post full debug output as before as well as output of the following:
curl -v https://courses.edx.org/user_api/v1/account/login_session/
Thank you Eugene.. This is my output when I try to list courses:
(base) C:\edx-dl-cummulative\edx-dl-cummulative>edx-dl -u antoniosereno29@gmail.com --list-courses edx_dl version 0.1.10 Password: Building initial headers for future requests. Getting initial CSRF token. Traceback (most recent call last): File "c:\users\anton\anaconda3\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\anton\anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\anton\Anaconda3\Scripts\edx-dl.exe\__main__.py", line 9, in <module> File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1000, in main headers = edx_get_headers() File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 425, in edx_get_headers 'X-CSRFToken': _get_initial_token(EDX_HOMEPAGE), File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 167, in _get_initial_token opener.open(url) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 531, in open response = meth(req, response) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 569, in error return self._call_chain(*args) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 400: Bad Request
and this one is of the previous line you asked us to launch
`(base) C:\edx-dl-cummulative\edx-dl-cummulative>curl -v https://courses.edx.org/user_api/v1/account/login_session/
GET /user_api/v1/account/login_session/ HTTP/1.1 Host: courses.edx.org User-Agent: curl/7.65.3 Accept: /
@antoniosereno Thanks, but from your debug output I can say for sure that edx-dl
from your environment is used, as indicated by this part of stack trace:
File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 167, in _get_initial_token opener.open(url)
Please point your python
directly to the edx-dl.py
from repo to avoid using version that is installed in your system.
Looking at your post, command should look something like this:
C:\edx-dl-cummulative\edx-dl-cummulative>python edx-dl.py -u antoniosereno29@gmail.com --list-courses
@EugeneLoy , works great with https://courses.edx.org/courses/course-v1:MITx+2.830.2x+3T2019/course/ , thank you so much for the time and effort! I hope it gets integrated into the master build soon.
It worked! I was able to download all the videos in the course! Thank you ! May I ask if there's a command to download not only medias (video and pdf) but also the written contents?
As far as I know if file is "attached" to course page it will be treated a resource by edx-dl
and will be downloaded. At least this was my experience so far.
Sometimes, however, you have extra content that is present on the page inline (like errata, tables, extra recitations and text explanations, etc). As far as I understand this is what you interested in.
Now, it just so happens that lately I've been working on a tool that saves this kind of content :)
It is also helpful if you want to save exercises and homework (with explanations), or, any other type of content that is displayed on the course pages.
This tool is meant to complement edx-dl
and is called edx-archive
and can be found here: https://github.com/EugeneLoy/edx-archive
I only released it recently, so if you guys check it out that would be great!
wow, I'll take a look at it! I was initially thinking of doing it manually, but it would be a long work! Thank you Eugene!
@EugeneLoy that worked, thanks for troubleshooting!
@EugeneLoy from your tool's page
-c, --concurrency
number of pages to save in parallel (default: 4)
I don't know what's the current state of their implementation on the backend now, but my impression was that hammering edx servers is generally not a good idea. FWIW, couple of years ago they blocked me by IP for several months after me flooding their servers with requests (debugging this edx-dl, by the way). It's not that the ban could not be surmounted, but the message was clear. So if you ask me, it's more of a courtesy to not put extra pressure on them by default. If you're still not convinced, please take your time to read this thread: https://github.com/coursera-dl/edx-dl/issues/377
@balta2ar Thanks, will take my time to read though #377 , however, motivation behind adding concurrency to the tool is not to speed things up on expense of edx servers but to shave some waste time taken by page render.
The tool makes snapshot of the page once it fully rendered (including math processing) and since edx pages can be pretty bloated (I saw pages taking more than a minute to render) this leads to a lot of time being wasted waiting for render (with no network activity).
The actual workload in terms of average request rate is not high and should not cause any issues with default settings. In fact I used much higher concurrency factor and I can say that the memory is much more of a bottleneck candidate than request rate overload.
Sorry for the late answer. Can you please mention the entire procedure to run the edx-archive-master? I'm not able to install it, anaconda prompt says that npm is not recognised as an internal or external command
@antoniosereno Hi.
npm
is "node package manager". It is distributed along with node.
If I am not mistaken, you can get node through conda by installing nodejs
package. Otherwise, you can get it from here.
Once you get npm
on your system, install edx-archive
:
npm install edx-archive -g
I'll update readme to clear this npm
part shortly.
it works perfectly @EugeneLoy ! Thanks a lot, you saved me a big amount of time!
still empty folders not working with https://courses.edx.org/courses/course-v1:UCSanDiegoX+DSE230x+3T2019a/course/
i have empty folders i tried the codes above but doesn't work. https://courses.edx.org/courses/course-v1:CurtinX+IOT4x+3T2019/course/
Is there a way to Download a Particular video and not the whole course...
🚨Please review the Troubleshooting section before reporting any issue. Don't forget also to check the current issues to avoid duplicates.
Subject of the issue
edx-dl fails to extract and download videos for "https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/" on www.edx.org it seems the videos for this course are sourced from "https://media.ed.ac.uk/" and not youtube Need help on resolving this issue
Your environment
Steps to reproduce
--- create an account on Edx
--- enroll for the course "https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/"
---- type the following into CMD
edx-dl -u username -p password -o path --ignore-errors --cache https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/
Expected behaviour
download to start normally
Actual behaviour
edx_dl version 0.1.10 Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Downloading Introduction to Predictive Analytics [course-v1:EdinburghX+PA1.1x+3T2019/co] Downloading 0 section(s) loading 2329 urls from cache [edx-dl.cache] Extracting all units information in parallel. No downloadable video found.