dgorissen / coursera-dl

A script for downloading course material (video's, pdfs, quizzes, etc) from coursera.org
http://dirkgorissen.com/2012/09/07/coursera-dl-a-coursera-download-script/
GNU General Public License v3.0
1.73k stars 299 forks source link

Week filter #105

Open joedicastro opened 10 years ago

joedicastro commented 10 years ago

The week filter should change its name, in some courses there is not a direct correlation between lectures and weeks and the current behavior filters by lecture's numbers. e.g. Machine Learning use two lectures or more for week.

So, should be more appropriate and straightforward use lectures (or sections) as the name of the filter, IMHO.

dgorissen commented 10 years ago

I agree, should be fixed as part of the discussion at #91

bravmi commented 10 years ago

yep, "week number" option is just stupid, incorrect. it misleads. like some ai logic being able to automatically break sections by weeks.

"section number" is better, but not by much, since no numbering (sometimes, but you can't rely on its presence) on the site. imo it could really work only with the script's some new option to list all the sections of a course numbered.

and I don't get why good old regex logic isn't "friendly". just specify filter, it gives you a numbered list matching (list in the worst case), and you choose by number. easy.

of course the script is still quite usable, if you forget entirely about week-section option and let it download new week's material. and most people probably need exactly that, so it's kinda okay. but having to keep entire course so far just for being able to conveniently download next week is ofc weird.

oh and thank you very much for this piece of software, appreciated :)

zanderle commented 10 years ago

regex isn't friendly, because for someone who doesn't know regex, it's a lot of troubles for a simple "give me weeks (sections/modules/whatever) 2 through 5". It's much more user friendly to simply specify which sections in a list "2,3,4,5".

It's a bit frustrating that every course divides its material differently. So far I've seen the use of "weeks", "sections", "modules" and I'm pretty sure there are many others. I guess something more general than "week number" would, like @bravmech suggested, be "section/module number". When I initially proposed the idea for week filter, every course I have done on coursera used weeks to divide material, so naturally I assumed, this is a general rule.

I agree that some functionality for downloading only the latest material would be nice. @dgorissen would it be possible to download only the items that haven't been watched? I know on coursera it shows which ones you have opened and which ones you haven't. Or you could maybe locally save a list of everything that has been downloaded for a particular course? Just some ideas :)

bravmi commented 10 years ago

@zanderle people don't need to know regexes in order to type a word in quotes! looks like if you call it filter (and actually implement as regex) it would be friendly enough )

as for section numbering - once again, imagine you're in the middle of a course and the course itself does not provide any numbering at all (like Analytic Combinatorics, just checked) or provides just week numbering with no numbers for sections (like algs4partII, where the first section is called "week 1 smth" and second "week 1 smth else"). sadly happens. so basically we (you) have to number the sections, maybe via future coursera-dl <course name> --list option. otherwise people what.. have to manually count or something these sections? "okay where am I now?"

dgorissen commented 10 years ago

@dgorissen would it be possible to download only the items that haven't been watched? I know on coursera it shows which ones you have opened and which ones you haven't. Or you could maybe locally save a list of >everything that has been downloaded for a particular course?

Anything is possible :) Its not top of my TODO list at the moment but happy to take pull requests.

zanderle commented 10 years ago

@bravmech Those are great points! Thank you for being a part of this discussion. So what I'm thinking now, is what would be a nice solution to all this (not all my ideas obviously:)); having a coursera-dl --list option to list all available sections/modules/weeks and corresponding indices. This would be useful in case the sections are defined in an inexplicit or ambiguous way. That way, the user would know which indices to use. After that the (what is now named, but will be aptly renamed) wk_filter, can do its job. I personally prefer this solution as opposed to implementing regex in wk_filter, since it's a lot easier to just type in the list of required sections, and we often know what that list will be, without looking at --list option. Any comments, before I start implementing this?

joedicastro commented 10 years ago

@zanderle I agree that the list (as @bravmech said) is the best solution for now (I use a similar idea to edit published articles for Pelican locally, choosing them by number in a list of names), and your idea to download only the unwatched ones is very good too, but I think that both are complementary and non exclusive. :+1: for both