coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 641 forks source link

Many courses don't download #374

Open illuzioner opened 8 years ago

illuzioner commented 8 years ago

Subject of the issue

When trying to download different courses, none have downloaded. The program crashes with the final line:

subprocess.CalledProcessError: Command '['youtube-dl', '--ignore-config', '-o', u'Downloaded\The_Analytics_Edge\01-Wee k_1-_An_Introduction_to_Analytics\03-%(title)s-%(id)s.%(ext)s', '-f', 'mp4', u'http://www.youtube.com/watch?v=YLR1byL0U 8M']' returned non-zero exit status 1

So far, none of the courses I've tried to download have downloaded. Often it is an apparent timeout on the EdX server, but sometimes it's a crashed instance.

Your environment

E:\Anaconda2\Scripts\edx-dl.exe -u myusername@outlook.com https://courses.edx.org/courses/MITx/15.071x/1T2014/info 2>>outerrors2.txt

Expected behaviour

It should download the course videos.

Actual behaviour

Seems to process, showing many lines, and then crashes. The last line looks like it has template parameters that failed to fill in by the code.

outerrors2.txt

balta2ar commented 8 years ago

I'm not sure if it's possible to add custom arguments to the downloader as you can in coursera-dl, thus you may need to hack a little. In this line: https://github.com/coursera-dl/edx-dl/blob/master/edx_dl/edx_dl.py#L733, try adding one more item '-i' to the list (in the beginning or in the end). This will tell youtube-dl to ignore errors.

iemejia commented 8 years ago

We have this issue reported before, we should probably add this to the troubleshotting section of the docs.

iemejia commented 8 years ago

If you check the logs, it says that the youtube video is not available and the script breaks, the script breaks by default on any error (this is like this so we can catch and fix those). However we can not fix the case when videos have been removed from youtube. So you can just use the flag --ignore-errors to ignore errors on downloads and try to get the most resources you can. You can also additionally try to download the videos from the direct sources if they are available using the flag --prefer-cdn-videos. Please try both and see what fits best for the case.

iemejia commented 8 years ago

Notice that I don't consider this a bug of edx-dl, so I just reported that we should document this case, so people understand better what is going on,

balta2ar commented 8 years ago

I totally forgot about --ignore-errors flag. That's way better than hacking the script.

rbrito commented 8 years ago

If the code in question is what I think it is, it was me that wrote it and gave it the behavior of failing upon first problem.

Like @iemejia, I don't quite consider it a bug, but I changed my mind (given the trouble that we give our users) and I think that it would be better to take the action that @balta2ar took in cousera-dl, namely, make --ignore-errors the default, collect the errors at runtime and, at the end of the execution, report them to the user.

illuzioner commented 8 years ago

I guess I'm somewhat confused. So all videos are downloaded from youtube? As edX-dl, I thought videos were downloaded directly from edX, where they are still available. I didn't know that you depended on videos being on YouTube. Are ALL edX courses supposed to be on YouTube before edX-dl will download? I guess that leaves a lot of courses not downloadable, correct?

iemejia commented 8 years ago

@illuzioner courses can publish videos in youtube or in the server they prefer, some of them have both, but most of them are youtube only so we have to cover all possible cases, it is up to you the user to check if the videos are available and download it in the better format. This is not 100% trivial to solve because the people who create the courses have a bigger flexibility about how to publish the videos.

iemejia commented 8 years ago

@rbrito If you think this is the best behavior maybe you can provide a patch, or @balta2ar I am a bit hesitating about this, I understand the user first reasons, but errors are errors and reporting those has helped tons to tune the script, I just wouldn't like to lose that, but it is ok to me if --ignore-errors is the default behavior.