coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 640 forks source link

Error after processing - no directory created or files downloaded #653

Open Eyespiral opened 4 years ago

Eyespiral commented 4 years ago

🚨Please review the Troubleshooting section before reporting any issue. Don't forget also to check the current issues to avoid duplicates.

Subject of the issue

I have made a number of changes to files based on the issues logged here and managed to get it to run as far as processing, but then it ends with an Attribute Error and nothing is downloaded, the output directory is not even created.

Your environment

Steps to reproduce

Download and install

  1. Download edx-dl-master.zip and extract to C:\
  2. Follow comment by Zibetti on https://github.com/coursera-dl/edx-dl/issues/631 to update line 425 of edx_dl.py to 'User-Agent': 'Chrome/86.0.4240.111' (the version of Chrome I run on my machine)
  3. Follow https://github.com/coursera-dl/edx-dl/issues/595 to update line 372 of parsing.py
  4. Run using python edx-dl.py -u jXXXXXXXXXXXX+edx@gmail.com https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/course/ -o C:\NHI101x

Expected behaviour

Should download the course

Actual behaviour

It fails and nothing is downloaded; the output directory isn't even created.

C:\edx-dl-master\edx-dl-master>python edx-dl.py -u jXXXXXXXXXXXX+edx@gmail.com https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/course/ -o C:\NHI101x
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Drawing Nature, Science and Culture: Natural History Illustration 101 [course-v1:NewcastleX+NHI101x+3T2020/co]
Downloading 6 section(s)
Section  1: Pre-Course Survey
  Pre-Course Survey
Section  2: Week 1: About the Course
  1.1 Introduction
  1.2 What is Natural History Illustration?
  1.3 How to Engage & Participate
  1.4 Homework
  1.5 Assessment Submission: Participation
Section  3: Week 2:  Observational Drawing
  2.1 Introduction
  2.2 Observational Drawing
  2.3 Spatial Depth
  2.4 Homework
  2.5 Assessment
  2.6 Prerequisite: Self and Peer Assessment Training
Section  4: Week 3: Field Work
  3.1 Introduction
  3.2 Keeping a Field Journal
  3.3  Fieldwork Capture
  3.4 Homework
  3.5 Assessment Submission: Mid-course Exam
  3.6 Assessment Submission: Participation
Section  5: Week 4:  Understanding Structure – Botanical
  4.1 Introduction
  4.2 Structure of Flowers
  4.3 Structure of Leaves
  4.4 Homework
  4.5 Assessment
  4.6 Assessment Submission: Flower Drawing
  4.7 Assessment Submission: Participation
Section  6: Week 5: Understanding Structure – Animals
  5.1 Introduction
  5.2 Structure of Mammals
  5.3 Structure of Birds
  5.4 Homework
  5.5 Assessment Submission: Participation
Extracting all units information in parallel.
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@213fc4b7fdcb47bf977a33cfee770dc9'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@7c64d33d8fab4c999a6bb72f024dc881'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@fcbfb5174fdc4758966fe6fa6f12aa71'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@9c27871ad0b049ac8657dbea10e0c3db'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@caa54c0d152f44418592ee9cecd5185c'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@b9f2d77198a345468ad77459ce310c20'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@2f120752391e481b864ce4d92c284f91'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@d7752d490435464ab51a530344df5681'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@8433a0111ca946a49912b2dc277286be'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@6a09b14d809749d388f8778d9c8cbdeb'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@0d3579b21ae54b2696f4c27e58fc42d6'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@4b1b2fbc646f463cb0bfba41d6275546'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@06d69c65bdb54fc0b9042dcd133da4af'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@aa89f365f72a4adf9e3a4c17b6bf3d3f'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@ddecba2b54fa4414a67d8798f72f37fd'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@f3a3c83cef51446d9a956243a5435ccc'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@9f2b7eb16cf242bb811c6e47c3c515f6'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@144a52fe2bdc41318a0959175d260570'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@6c687dddfd2b49aeb949ddcccb092694'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@46db83d99b5449cab32c032788ac4705'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@1ff3bd6b45b14c4290e5472f8118fdc9'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@e1fd0587183d4ab096f70141db789e2f'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@e815b8c1586d434c8426b3a5c217a175'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@a176f5914a3e4afbad56f2269f22105f'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@10e2149ef37e4eec9d55e314ead75479'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@4a71cf5241004739b3a3c7be3844001a'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@2ac5e37228b04731b77e3396e2e3b8b0'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@f4ebb9276961408b87d061b4eecfcf47'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@a7090d4ed287450cac6107d87565d482'
Processing 'https://courses.edx.org/courses/course-v1:NewcastleX+NHI101x+3T2020/jump_to/block-v1:NewcastleX+NHI101x+3T2020+type@sequential+block@20c4d07be23f4e9ea674142f9dc09125'
Removed 0 duplicated urls from 0 in total
Output directory: C:\NHI101x
Traceback (most recent call last):
  File "C:\edx-dl-master\edx-dl-master\edx-dl.py", line 8, in <module>
    edx_dl.main()
  File "C:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 1073, in main
    download(args, selections, filtered_units, headers)
  File "C:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 836, in download
    coursename = directory_name(selected_course.name)
  File "C:\edx-dl-master\edx-dl-master\edx_dl\utils.py", line 49, in directory_name
    result = clean_filename(initial_name)
  File "C:\edx-dl-master\edx-dl-master\edx_dl\utils.py", line 123, in clean_filename
    s = h.unescape(s)
AttributeError: 'HTMLParser' object has no attribute 'unescape'
bradleygrant commented 3 years ago

This seems like a viable fix for backwards compatibility with Python 3.9. This swaps six.moves.HTMLParser with html which is a Python built-in.

I tried it and the error went away:

https://github.com/coursera-dl/edx-dl/commit/5490a99a98b56f544661c131229ef640ace2b064