coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 640 forks source link

skip already downloaded videos #432

Open sina-mansour opened 7 years ago

sina-mansour commented 7 years ago

Subject of the issue

when I'm downloading the course which is too big and fails because of network. it repeats downloadin some large videos eventhough they have already been downloaded. I suppose the files are not tracked or checked before downloading again.

Your environment

Steps to reproduce

try downloading this course: https://courses.edx.org/courses/course-v1:Microsoft+DAT210x+5T2016/info you can see that it has about 15GB downloaded for chapter 2. I use this command to download it: edx-dl -u mymail@gmail.com https://courses.edx.org/courses/course-v1:Microsoft+DAT210x+5T2016/info --ignore-errors

Expected behaviour

I expect it to skip the already downloaded files.

Actual behaviour

every time the downloads fail and I re run the command it starts downloading some of the existing files again.

jcline-ieee commented 6 years ago

This seems like a command line flag exists for that. I remember downloading that particular course and had no problems with my setup and cmd line options. However this was one of those courses that included Playlist links and downloaded a lot of videos.

The course also includes this "dangerous" link https://www.youtube.com/results?search_query=spyder+tutorial+python with the result [download] https://www.youtube.com/results?search_query=spyder+tutorial+python => edx/Programming_with_Python_for_Data_Science/01-Start_Here/03-%(title)s-%(id)s.%(ext)s Downloading video with URL https://www.youtube.com/results?search_query=spyder+tutorial+python from YouTube. [youtube:search_url] spyder tutorial python: Downloading webpage [download] Downloading playlist: spyder tutorial python [youtube:search_url] playlist spyder tutorial python: Downloading 0 videos [download] Finished downloading playlist: spyder tutorial python

See below log which shows [skipping] which means already downloaded files...

[skipping] https://d2f1egay8yehza.cloudfront.net/MSXPPDSX/MSXPPDSX2016-V000800_DTH.mp4 => edx/Programming_with_Python_for_Data_Science/01-Start_Here/01-MSXPPDSX2016-V000800_DTH.mp4 [download] https://courses.edx.org/courses/course-v1:Microsoft+DAT210x+1T2017/xblock/block-v1:Microsoft+DAT210x+1T2017+type@video+block@3bb84380c8054e8b810319e4332a16b9/handler/transcript/translation/en => edx/Programming_with_Python_for_Data_Science/01-Start_Here/01-Michael_Phelps.en.srt [skipping] https://en.wikipedia.org/wiki/Michael_Phelps => edx/Programming_with_Python_for_Data_Science/01-Start_Here/01-Michael_Phelps [skipping] https://d2f1egay8yehza.cloudfront.net/MSXPPDSX/MSXPPDSX2016-V000300_DTH.mp4 => edx/Programming_with_Python_for_Data_Science/01-Start_Here/02-MSXPPDSX2016-V000300_DTH.mp4 [skipping] https://courses.edx.org/courses/course-v1:Microsoft+DAT210x+1T2017/xblock/block-v1:Microsoft+DAT210x+1T2017+type@video+block@0b85c95efbb444d6b4470cfb9cff6eab/handler/transcript/translation/en => edx/Programming_with_Python_for_Data_Science/01-Start_Here/02-MSXPPDSX2016-V000300_DTH.en.srt [download] https://www.youtube.com/results?search_query=virtualenv => edx/Programming_with_Python_for_Data_Science/01-Start_Here/03-%(title)s-%(id)s.%(ext)s Downloading video with URL https://www.youtube.com/results?search_query=virtualenv from YouTube. [youtube:search_url] virtualenv: Downloading webpage [download] Downloading playlist: virtualenv [youtube:search_url] playlist virtualenv: Downloading 20 videos [download] Downloading video 1 of 20 [youtube] N5vscPTWKOk: Downloading webpage [youtube] N5vscPTWKOk: Downloading video info webpage [youtube] N5vscPTWKOk: Extracting video information WARNING: video doesn't have subtitles [youtube] N5vscPTWKOk: Downloading MPD manifest [download] edx/Programming_with_Python_for_Data_Science/01-Start_Here/03-Python Tutorial - virtualenv and why you should use virtual environments-N5vscPTWKOk.mp4 has already been downloaded [download] 100% of 18.29MiB [download] Downloading video 2 of 20 [youtube] IX-v6yvGYFg: Downloading webpage [youtube] IX-v6yvGYFg: Downloading video info webpage [youtube] IX-v6yvGYFg: Extracting video information WARNING: video doesn't have subtitles [youtube] IX-v6yvGYFg: Downloading MPD manifest [download] edx/Programming_with_Python_for_Data_Science/01-Start_Here/03-Python Power Tools - virtualenv-IX-v6yvGYFg.mp4 has already been downloaded [download] 100% of 104.55MiB [download] Downloading video 3 of 20 [youtube] GBAJ7VKyEpI: Downloading webpage [youtube] GBAJ7VKyEpI: Downloading video info webpage [youtube] GBAJ7VKyEpI: Extracting video information WARNING: video doesn't have subtitles [youtube] GBAJ7VKyEpI: Downloading MPD manifest [download] edx/Programming_with_Python_for_Data_Science/01-Start_Here/03-Installing and Using virtualenv-GBAJ7VKyEpI.mp4 has already been downloaded [download] 100% of 14.37MiB [download] Downloading video 4 of 20 [youtube] ETL-_W1W8gY: Downloading webpage [youtube] ETL-_W1W8gY: Downloading video info webpage [youtube] ETL-_W1W8gY: Extracting video information WARNING: video doesn't have subtitles [youtube] ETL-_W1W8gY: Downloading MPD manifest [download] edx/Programming_with_Python_for_Data_Science/01-Start_Here/03-Docker as a replacement for virtualenv-ETL-_W1W8gY.mp4 has already been downloaded

and another dangerous youtube "search" link which ends up with a LOT of downloads (19 videos)..

[download] https://www.youtube.com/results?search_query=Audacity+Tutorial => edx/Programming_with_Python_for_Data_Science/09-Course_Wrap-up/01-%(title)s-%(id)s.%(ext)s Downloading video with URL https://www.youtube.com/results?search_query=Audacity+Tutorial from YouTube. [youtube:search_url] Audacity Tutorial: Downloading webpage [download] Downloading playlist: Audacity Tutorial [youtube:search_url] playlist Audacity Tutorial: Downloading 19 videos [download] Downloading video 1 of 19 [youtube] 3uqCNjbQn54: Downloading webpage [youtube] 3uqCNjbQn54: Downloading video info webpage [youtube] 3uqCNjbQn54: Extracting video information WARNING: video doesn't have subtitles [youtube] 3uqCNjbQn54: Downloading MPD manifest [download] edx/Programming_with_Python_for_Data_Science/09-Course_Wrap-up/01-Audacity Beginner Tutorial-3uqCNjbQn54.mp4 has already been downloaded ....

pablodz commented 6 years ago

Check youtube-dl in Github properties