coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 641 forks source link

Cannot download schoolyourself.org courses #342

Open deepakjois opened 8 years ago

deepakjois commented 8 years ago

Having problems downloading this course: https://www.edx.org/course/introduction-algebra-schoolyourself-algebrax

And here is the output of --list-courses

$ ~/edx/edx-dl/edx-dl.py -u mooc.fetcher@gmail.com  --list-courses
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
You can access 10 courses
 1 - Introduction to Algebra [SchoolYourself/AlgebraX/1T2015]
     https://courses.edx.org/courses/SchoolYourself/AlgebraX/1T2015/info
…<snipped>

and here is the output of the actual invocation to download the course.

$ ~/edx/edx-dl/edx-dl.py -u mooc.fetcher@gmail.com  "https://courses.edx.org/course/SchoolYourself/AlgebraX/1T2015/info"
Password:<hidden>
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
You have not passed a valid course url, check the correct url with --list-courses
iemejia commented 8 years ago

have you tried the --list-courses option ? Does it show the course there ? Can you try please witouth the quotes ?

deepakjois commented 8 years ago

I included the (truncated) --list-courses output above (the first code block). Yes, it shows the course there.

I will try again without the quotes, but if I remember right I did that already, and it made no difference.

deepakjois commented 8 years ago

Confirmed. Not working without the quotes as well.

iemejia commented 8 years ago

I just checked and I can download only one video from the series, it seems this course uses a new layout, so the script can't download the videos for the moment. (Just to confirm, can you at least download the first video?)

deepakjois commented 8 years ago

I have been trying this since yesterday, and the result was exactly what I included in my initial report.

But I tried again after your comment, and now I am getting something different. The full output is a bit too long but I am including the relevant portion below. It now goes through the course information page, but it is not able to extract any URLs and stops. I can only see a directory structure in the output folder. I cannot see any videos. So to answer your question, I cannot download any videos.

Another thing that may be relevant is that this is the only course that I am currently enrolled in that does not have the course:v1: prefix in the Course ID. Maybe that indicates something.

~/edx/edx-dl/edx-dl.py -u mooc.fetcher@gmail.com   https://courses.edx.org/courses/SchoolYourself/AlgebraX/1T2015/info
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Introduction to Algebra [SchoolYourself/AlgebraX/1T2015]
Downloading 15 section(s)
Section  1: Getting started
  Welcome!
Section  2: Addition and subtraction
  The number line
  Addition
  Subtraction
  Adding negatives
  Subtracting negatives
  Absolute value
  Distance on the number line
Section  3: Multiplication and division
  Multiplication
  Multiplying by 1 and 0
…
…
…<snipped>…
…
…
Processing 'https://courses.edx.org/courses/SchoolYourself/AlgebraX/1T2015/courseware/0ad285d46be545b8b9dad33151ba5772/cf7e69807694431e95930de3c68cf3dc/'
Processing 'https://courses.edx.org/courses/SchoolYourself/AlgebraX/1T2015/courseware/0ad285d46be545b8b9dad33151ba5772/c01361afa1fe499c802d296f28267359/'
Processing 'https://courses.edx.org/courses/SchoolYourself/AlgebraX/1T2015/courseware/0ad285d46be545b8b9dad33151ba5772/d44d7dd3e36d436d9ef2ec85bbbab830/'
Processing 'https://courses.edx.org/courses/SchoolYourself/AlgebraX/1T2015/courseware/2355f8c228f641eb81566fc967a63696/f5871378a06f4eaaa0fde3f8bbeb9053/'
Removed 0 duplicated urls from 0 in total
Output directory: Downloaded
deepakjois commented 8 years ago

Same problem with this course as well: https://www.edx.org/course/introduction-geometry-schoolyourself-geometryx

iemejia commented 8 years ago

I was just checking more details, and it is going to be hard to download these courses since they are highly interactive, in general the videos are short and expect user interaction, to trigger next videos, so the structure is really different. Anyway I let the issue open in case someone wants to implement it.