coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 639 forks source link

again folders are empty and nothing is downloaded #655

Closed Ergasta99 closed 3 years ago

Ergasta99 commented 3 years ago

Subject of the issue

folders are empty and nothing is downloaded,

Your environment

Requirement already satisfied: beautifulsoup4>=4.6.0 in /usr/lib/python3.6/site-packages (from -r requirements.txt (line 1)) (4.9.1) Requirement already satisfied: html5lib>=1.0.1 in /usr/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.0.1) Requirement already satisfied: six>=1.11.0 in /usr/lib/python3.6/site-packages (from -r requirements.txt (line 3)) (1.11.0) Requirement already satisfied: youtube_dl>=2018.06.18 in /usr/lib/python3.6/site-packages (from -r requirements.txt (line 4)) (2020.11.26) Requirement already satisfied: requests>=2.18.4 in /usr/lib/python3.6/site-packages (from -r requirements.txt (line 5)) (2.23.0) Requirement already satisfied: soupsieve>1.2 in /usr/lib/python3.6/site-packages (from beautifulsoup4>=4.6.0->-r requirements.txt (line 1)) (2.0.1) Requirement already satisfied: webencodings in /usr/lib/python3.6/site-packages (from html5lib>=1.0.1->-r requirements.txt (line 2)) (0.5.1) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/lib/python3.6/site-packages (from requests>=2.18.4->-r requirements.txt (line 5)) (1.25.9) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/lib/python3.6/site-packages (from requests>=2.18.4->-r requirements.txt (line 5)) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /usr/lib/python3.6/site-packages (from requests>=2.18.4->-r requirements.txt (line 5)) (2.9) Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3.6/site-packages (from requests>=2.18.4->-r requirements.txt (line 5)) (2020.4.5.1)

Steps to reproduce

edx-dl --ignore-errors -u user@domain.com -p password -s --with-subtitles --cache --debug https://courses.edx.org/courses/course-v1:RollsRoyce+Rolls-Royce4X+1T2020/course/

[get_available_sections] Extracting sections for :https://courses.edx.org/courses/course-v1:RollsRoyce+Rolls-Royce4X+1T2020/course/ root[get_available_sections] Extracted sections: [<edx_dl.common.Section object at 0x7f278dc06128>, <edx_dl.common.Section object at 0x7f278dc1c550>, <edx_dl.common.Section object at 0x7f278dc1c668>, <edx_dl.common.Section object at 0x7f278dc1c6d8>, <edx_dl.common.Section object at 0x7f278dc1c748>, <edx_dl.common.Section object at 0x7f278dc1c7b8>] root[_display_selections] Downloading Data Privacy Awareness [course-v1:RollsRoyce+Rolls-Royce4X+1T2020/co] root[_display_sections] Downloading 6 section(s) root[_display_sections] Section 1: Our approach to data root[_display_sections] Our approach to data root[_display_sections] Section 2: GDPR and you root[_display_sections] GDPR and you root[_display_sections] Section 3: The value of data root[_display_sections] The value of data root[_display_sections] Section 4: Getting it right root[_display_sections] Getting it right root[_display_sections] Section 5: Data hygiene root[_display_sections] Data hygiene root[_display_sections] Section 6: Keeping data safe root[_display_sections] Keeping data safe root[_display_sections] Acknowledgement root[extract_all_units_with_cache] loading 7 urls from cache [edx-dl.cache] root[extract_all_units_in_parallel] Extracting all units information in parallel. root[extract_all_units_in_parallel] urls: [] root[write_units_to_cache] writing 7 urls to cache [edx-dl.cache] root[main] Removed 0 duplicated urls from 0 in total root[download] Output directory: Downloaded

Expected behaviour

the download folder is created with the structure but all the folders are empty and nothing is downloaded, the changes to the parsing.py and edx_dl.py files have already been performed and had worked, but now it does not download anything new, I noticed that now the videos are with extension: .m3u8

and to the url: https: //edx-video.net / .....

Actual behaviour

Removed 0 duplicated urls from 0 in total Output directory: Downloaded

Ergasta99 commented 3 years ago

edx-dl-master.zip

this my edx-dl.py and parsing.py

wisecrick commented 3 years ago

edx-dl-master.zip

this my edx-dl.py and parsing.py

It doesn't work on https://courses.edx.org/courses/course-v1:HarvardX+CS50+X/course/

bilel commented 3 years ago

This happened after an eDx recent update. It's been a while this issue is being reported... ! The quickest workaround I found helpful is using @RJFeddeler ' Fork here : https://github.com/RJFeddeler/edx-dl

Example STEPS to follow:

And let it do it's Job :) Don't forget to star that Forked Repo... He deserves some gifts :+1:

Ergasta99 commented 3 years ago

@bilel thanks! it works!

wisecrick commented 3 years ago

@bilel thanks for your information!

MATRIX30 commented 3 years ago

This happened after an eDx recent update. It's been a while this issue is being reported... ! The quickest workaround I found helpful is using @RJFeddeler ' Fork here : https://github.com/RJFeddeler/edx-dl

Example STEPS to follow:

And let it do it's Job :) Don't forget to star that Forked Repo... He deserves some gifts 👍

Didn't work for me I keep getting this error message.

Building initial headers for future requests. Getting initial CSRF token. Found CSRF token. Logging into Open edX site: https://courses.edx.org/login_ajax Extracting course information from dashboard. Downloading Applied Scrum for Agile Project Management [course-v1:USMx+ENCE607.1x+3T2019/co] Section 1: Welcome! Welcome to Applied Scrum Getting Started with Goals! Section 2: Week 1: Why Agile? 1.0 Introduction to Week 1 1.1 Agile Basics 1.2 Proof Agile Works 1.3 Evolution of Agile 1.4 Netflix Case Study 1.5 18F Case Study 1.6 Week 1 Quiz 1.7 Week 1 Takeaways & Feedback Verify Your Knowledge and Skills! Section 3: Week 2: Who Uses Agile? 2.0 Introduction to Week 2 2.1 Simple PM Methods 2.2 Approaching the Triple Cost Constraint 2.3 Comparing Methods Across Industries 2.4 Comparing Methods of Customer Management 2.5 Comparing Methods of Engineering Management 2.6 Week 2 Quiz 2.7 Week 2 Takeaways & Feedback Verify Your Knowledge and Skills! Section 4: Week 3: How to Scrum And Be Agile? 3.0 Introduction to How to Scrum and Be Agile? 3.1 Scrum Team Formation 3.2 Three-Part User Story 3.3 Sprint Planning 3.4 Sprint Development 3.5 Sprint Retro & Review 3.6 Week 3 Quiz 3.7 Week 3 Takeaways & Feedback Verify Your Knowledge and Skills! Section 5: Week 4: What Scrum Framework Fits Best? 4.0 Introduction to What Scrum Framework Fits Best? 4.1 Scrum in the World of Agile 4.2 Exploring the Scaled Agile Framework (SAFe) 4.3 Exploring Disciplined Agile Delivery (DAD) 4.4 Exploring Large Scale Scrum (LeSS) 4.5 Pitfalls and Benefits of Agile at Scale 4.6 Week 4 Quiz 4.7 Week 4 Takeaways & Feedback Verify Your Knowledge and Skills! Section 6: Course Final for Verified Students Course Final for Verified Students Section 7: Congratulations! Now Keep Going! Thank You! Now Will You Continue? Feedback Quiz Processing units...

Removed 0 duplicated urls from 76 in total

edx_dl version 0.1.13 loading 3212 urls from cache [edx-dl.cache] Traceback (most recent call last): File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\Cyanide Systems\AppData\Local\Programs\Python\Python39\Scripts\edx-dl.exe_main.py", line 7, in File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\edx_dl.py", line 1233, in main download(args, selections, filtered_units, headers) File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\edx_dl.py", line 989, in download coursename = directory_name(selected_course.name) File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\utils.py", line 49, in directory_name result = clean_filename(initial_name) File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\utils.py", line 123, in clean_filename s = h.unescape(s) AttributeError: 'HTMLParser' object has no attribute 'unescape'

bilel commented 3 years ago

@MATRIX30 that's another Python compatibility issue :) It seems you are running the latest Python version...

Source : https://github.com/coursera-dl/coursera-dl/issues/778

MATRIX30 commented 3 years ago

Thanks @bilel It worked like magic

jackforfaltu commented 3 years ago

I'm getting an ImportError when trying to run edx-dl through the python interpreter.

$python edx_dl.py -u user@user.com https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS101x+1T2020/course/

Traceback (most recent call last): File "edx_dl.py", line 41, in <module> from ._version import __version__ ImportError: attempted relative import with no known parent package

I'm no expert in python but I don't think I can use relative imports if they're not in the edx-dl package (I could be wrong though).

Is there any workaround to fix this? I'm using Linux btw, not windows.

bilel commented 3 years ago

@jackforfaltu Did you download the updated fork I suggested above.. And then CD into that directory close to edx_dl.py?

May be python is confused between the edx-dl binary installed using Pip and this one? Not sure yet ! :)

jackforfaltu commented 3 years ago

@jackforfaltu Did you download the updated fork I suggested above.. And then CD into that directory close to edx_dl.py?

May be python is confused between the edx-dl binary installed using Pip and this one? Not sure yet ! :)

Yeah I'm using the fork from RJFeddeler, but I didn't use pip to download, I just git cloned it.

bilel commented 3 years ago

Can you try this ? :

Later you could just use the original command-line like following :

That should work :)

bilel commented 3 years ago

For those using Windows it's the same thing. It's better to report it to this project contributors ! Because if you expect automatic updates (using Pip) the following guidelines would resolve in Package name conflict !

Otherwise you can follow these steps :

Learnpython-code commented 3 years ago

Hello everyone, I am not expert in Python, I download the suggested above https://github.com/RJFeddeler/edx-dl I copy in AppData\Local\Programs\Python\Python38-32\Lib\site-packages\edx_dl\edx-dl-master\edx-dl I used Python terminal, I got result File "", line 1 Syntaxerror: invalid syntax Thank you for your help

bilel commented 3 years ago

@Learnpython-code you better undo what you did ... first if you expect to install it, it would do the same thing in the right path (not nested under edx_dl\edx-dl-master) so it would be possible to reuse globally in the command line.

That should work

dev-davexoyinbo commented 3 years ago

Works for me, but the video and the audio tracks are separated

zdanek commented 3 years ago

I can confirm that the fork of @RJFeddeler works great. Thank you. Can you guys mergeback what he did?

If you have separate audio and video files install ffmpeg or avconv. As stated in https://github.com/ytdl-org/youtube-dl/blob/master/README.md#format-selection so underlying call to youtube-dl will merge those files.

Joxephizer commented 3 years ago

This happened after an eDx recent update. It's been a while this issue is being reported... ! The quickest workaround I found helpful is using @RJFeddeler ' Fork here : https://github.com/RJFeddeler/edx-dl

Example STEPS to follow:

And let it do it's Job :) Don't forget to star that Forked Repo... He deserves some gifts 👍

It worked like a charm. Thanks man!

jonnajp commented 3 years ago

Followed steps as above but getting

**raise HTTPError(req.full_url, code, msg, hdrs, fp)

urllib.error.HTTPError: HTTP Error 403: Forbidden**

Any ideas on how to fix this? Thanks much.

zdanek commented 3 years ago

@jonnajp most likely you didn't change http user agent to something like Mozilla... etc

Ergasta99 commented 3 years ago

Ergasta99 commented 26 days ago @bilel thanks! it works!

rameshvlsi85 commented 3 years ago

@bilel

I tried both: https://github.com/RJFeddeler/edx-dl and https://github.com/coursera-dl/edx-dl/commit/5490a99a98b56f544661c131229ef640ace2b064

I still get this below error:

Can you please help?

Processing units...

Traceback (most recent call last): File "edx-dl.py", line 8, in edx_dl.main() File "D:\2021\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 1213, in main all_units = extractor(all_urls, headers, file_formats) File "D:\2021\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 590, in extract_all_units_in_parallel units = pool.map(mapfunc, urls) File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 657, in get raise self._value File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(args)) File "D:\2021\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 559, in extract_units unit_page = get_page_contents(unit_url, headers) File "D:\2021\edx-dl-master\edx-dl-master\edx_dl\utils.py", line 58, in get_page_contents result = urlopen(Request(url, None, headers)) File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 503, in _call_chain result = func(args) File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 500: Internal Server Error