dgorissen / coursera-dl

A script for downloading course material (video's, pdfs, quizzes, etc) from coursera.org
http://dirkgorissen.com/2012/09/07/coursera-dl-a-coursera-download-script/
GNU General Public License v3.0
1.74k stars 300 forks source link

Detecting renamed files #62

Closed ilfats closed 11 years ago

ilfats commented 11 years ago

File rename detection

Coursera professors often re-organize course content while a course is running and this leads to changes in topic / unit names, which leads to coursera-dl downloading the same content again under different names.

This change is aimed to detect renames and reuse already downloaded files. Directory renames are not detected at this moment.

A separate deduplication project

I've also created a separate project for removing duplicates in already downloaded content. This one is capable of detecting directory renames. Feel free to include this in your project or just put a link in your project description. https://github.com/ilfats/dedup.git

ilfats commented 11 years ago

This code uses normalized file name and size to detect renamed files. Coursera web site seems to not report content length for small files (e.g. subtitles), so renames are not detected for these.

dgorissen commented 11 years ago

Thanks!