dgorissen / coursera-dl

A script for downloading course material (video's, pdfs, quizzes, etc) from coursera.org
http://dirkgorissen.com/2012/09/07/coursera-dl-a-coursera-download-script/
GNU General Public License v3.0
1.73k stars 299 forks source link

Added the ability to specify which weeks to download #91

Closed zanderle closed 10 years ago

zanderle commented 10 years ago

Added weeks (-w) paramater - you specify which weeks you want to download from, so you don't have to download the whole course. Example input:

coursera-dl -u username -p password -w "1,4,5" course

which would download the first, fourth and the fifth week. If "-w" is omitted, it downloads all available weeks.

It assumes that the first number found in week header, is the number of the week and it ignores other numbers (e.g. "Week 1: TCP/IP, 10 rules of something", "Module 10: Example", "10th week: ARC")

dgorissen commented 10 years ago

Thanks for that. Why not instead allow the user to specify a regex pattern. Only those classes that match the pattern will be included. That avoids your assumption (e.g., it breaks with the algorithms classes which use roman numbering) and gives people more flexibility.

danmbox commented 10 years ago

However you implement it, make it possible to download FROM week N. I don't know if these proposals cover that. This is the most frequent use case -- I want to catch up with the course, and my last downloaded week is N. Also, for the "reverse sections" option, please let the numbers correspond to the original order, so that when you re-download you don't end up with changed names. I.e. "15 - week 15", then "14 - week 14" etc, instead of "1 - week 15", then "2 - week 14" (which in my opinion is useless, at least as far as re-downloading)

zanderle commented 10 years ago

Ok! I'll look into that (both comments) and I'll get back at you

dgorissen commented 10 years ago

Cfr. the continue where you left off functionality is an oft requested feature so I will add that separately as an inbuilt feature, not requiring a manual flag.

danmbox commented 10 years ago

Sometimes they change materials from the previous week (so if the last week downloaded is week 9, sometimes week 8 is still worth checking out). How will you implement continuing where you left off? A separate file to record state? I don't see any other reliable way. Regarding roman numerals, I thought all weeks are prepended (on disk) with the actual index in weeklyTopics (so it might be 12 - XII blah), right?

zanderle commented 10 years ago

Yeah that's something I'm curious about too. There are weeks that have no numbers (like 'introduction') and that's where my algorithm breaks, which I have to fix, but other then that - don't they always have numbers next to them?

danmbox commented 10 years ago

I was talking about the numbers that coursera-dl prepends to all weeks, e.g. "aiplan-001/01 - Week 1 - Introduction and Planning in Context". That's the index in weeklyTopics. That's the only number that's worth using. I think being able to specify "--weeks 1,3,8-" (i.e. everything after week 8) would take care of everything. Regex's don't usually work well with numbers, and I personally don't see the point in using regexes to select weekly titles.

zanderle commented 10 years ago

Two changes:

I will add the ability to specify weeks in roman numbers (can someone confirm that coursera actually uses them in some cases?) and the ability to specify weeks like "1:4", "3:6" or "6:" (similar to python syntax).

danmbox commented 10 years ago

Python range syntax is perfect.

But why are roman numbers needed? The only thing that's needed is to specify indexes in weeklyTopics. It took me three lines to fix this in my copy (plus changes to the main argparser):

--- a/courseradownloader/courseradownloader.py
+++ b/courseradownloader/courseradownloader.py
@@ -309,7 +309,7 @@ class CourseraDownloader(object):
         except Exception as e:
             print "Failed to download url %s to %s: %s" % (url,filepath,e)

-    def download_course(self,cname,dest_dir=".",reverse_sections=False):
+    def download_course(self,cname,dest_dir=".",reverse_sections=False,start_week=1):
         """
         Download all the contents (quizzes, videos, lecture notes, ...) of the course to the given destination directory (defau
         """
@@ -331,6 +331,8 @@ class CourseraDownloader(object):
             weeklyTopics.reverse()
             print "* Sections reversed"

+        weeklyTopics = weeklyTopics [start_week - 1:]
+
         course_dir = path.abspath(path.join(dest_dir,cname))

         # ensure the target dir exists
@@ -349,7 +351,7 @@ class CourseraDownloader(object):
         self.download(course_url,target_dir=course_dir,target_fname="lectures.html")

         # now download the actual content (video's, lecture notes, ...)
-        for j,weeklyTopic in enumerate(weeklyTopics,start=1):
+        for j,weeklyTopic in enumerate(weeklyTopics,start=start_week):
             if weeklyTopic not in allClasses:
                 #TODO: refactor
dgorissen commented 10 years ago

I initially committed this feature using a regex, but @danmbox's suggestion to use the week indices is more user friendly. So thats what I have changed it to. Simple and straightforward to implement, and more robust than having to parse the week titles.

danmbox commented 10 years ago

Ugh. I have 1.5.1, there is a -w argument, but I still can't specify -w 5: (to get everything from week 5 to the end)

zanderle commented 10 years ago

@danmbox I'm not sure who your message was intended for, but I've stopped with this feature since Dirk closed the pull request...

dgorissen commented 10 years ago

Triggered by the work of @zanderle but adapted to the comments of @danmbox I added this feature and considered all bases covered and the issue closed. So turns out I did not consider the range syntax, but please dont let that stop you (@zanderle or @danmbox) from reopening the issue and updating the patch. Should only require minor changes to the code.

zanderle commented 10 years ago

Thank you for elaborating. I'll take it on (unless @danmbox is in a hurry and will want to take it on himself :))

zanderle commented 10 years ago

Um, I haven't committed anything yet...

dgorissen commented 10 years ago

Strange, I did not mean to close it. Not sure what happened. Github says the status is merged though I did not concsiously merge anything. Also, what does "0 commits merged" mean? Can't reopen it either. sigh.

zanderle commented 10 years ago

Hmm. Should I just do a new pull request, when I'm ready?

dgorissen commented 10 years ago

Sure On 4 Nov 2013 12:35, "Žan Anderle" notifications@github.com wrote:

Hmm. Should I just do a new pull request, when I'm ready?

— Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/pull/91#issuecomment-27681839 .