dgorissen / coursera-dl

A script for downloading course material (video's, pdfs, quizzes, etc) from coursera.org
http://dirkgorissen.com/2012/09/07/coursera-dl-a-coursera-download-script/
GNU General Public License v3.0
1.73k stars 299 forks source link

HTML pages will redirect to 404 error on coursera website rather than opening in locale #121

Open McAldo opened 10 years ago

McAldo commented 10 years ago

II a on Linux ubuntu, last version of the script downloaded from PIP, I ran the PIP update just in case. Python version: import encodings.utf_8 # precompiled from /usr/lib/python2.7/encodings/utf_8.pyc Python 2.7.5+ (default, Sep 19 2013, 13:49:51)

I tried and download a course which is just about to close. Videos were downloaded correctly. The two html files and .json file appears to have been parsed correctly when opening with an editor, at least, in terms of text the additional materials are there. However, when opening with Firefox they will redirect to a 404 page on the coursera site, rather than opening in locale. Also, there was a pdf file which didn't get downloaded, the script gave an error message. I didn't specify any parser upon executing the script, if that matters. The course is: classicalcomp-001

Sorry, there is a good chance this might be due to user error, I am still getting to grips with Linux.

dgorissen commented 10 years ago

Please use the latest version from github, that should solve the pdf issue. As for html files, given the dynamic nature of the site you cant always guarantee all pages will download exactly the way you would expect from seeing them in a browser.

McAldo commented 10 years ago

Thanks, I'll try with the latest version then. As for the html pages, the problem is really that the browser seems to be unable to display them, rather than formatting errors, if I understand what you mean. But I have tried with another software and I run exactly into the same problem, so it must be something with that specific course or with settings on my local machine.

shannelle commented 10 years ago

I would like to confirm the 404 issue. I am encountering this with 2 coursera subjects (one just ended a few weeks ago and the other one is still active). If i view the html files, it tries loading and then will display a 404 error. I am newly introduced to this project so I just downloaded the script (using git clone) several hours ago and ran it from OS X 10.9.1.

McAldo commented 10 years ago

I am glad I am not the only one running into the problem. Thing is, when opening the files with an editor, it seems all the html code is there. Just, when opening the files in a browser (Firefox and Chromium latest Linux versions in my case), they are not displayed but the page is perhaps redirected to the coursera servers, which then display a 404 error. Could this be due to a parsing problem? Perhaps specifying a parser different from the default one could solve the problem?

On Sat, Mar 8, 2014 at 6:02 PM, shannelle notifications@github.com wrote:

I would like to confirm the 404 issue. I am encountering this with 2 coursera subjects (one just ended a few weeks ago and the other one is still active). If i view the html files, it tries loading and then will display a 404 error. I am newly introduced to this project so I just downloaded the script (using git clone) several hours ago and ran it from OS X 10.9.1.

Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/issues/121#issuecomment-37104657 .

dgorissen commented 10 years ago

This is not parser related but more to do with the content & page buildup itself. If you edit the files in the text editor and remove references to coursera urls that may help. I do not have time to look into this in detail unfortunately. I did just release a new version but this will probably not solve your issue (but you can try).