dgorissen / coursera-dl

A script for downloading course material (video's, pdfs, quizzes, etc) from coursera.org
http://dirkgorissen.com/2012/09/07/coursera-dl-a-coursera-download-script/
GNU General Public License v3.0
1.73k stars 299 forks source link

Binary gibberish followed by weird HTML headers in index.html, lectures.html #128

Open danmbox opened 10 years ago

danmbox commented 10 years ago
less ./randomness-001/lectures.html

<binary gibberish>
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Type: text/html
Date: Thu, 20 Mar 2014 15:06:59 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Location: https://class.coursera.org/randomness-001/lecture
Pragma: no-cache
Server: nginx
X-Frame-Options: SAMEORIGIN
X-Powered-By: PHP/5.3.10-1ubuntu3.10
Content-Length: 130
Connection: keep-alive

Redirecting to <a href="https://class.coursera.org/randomness-001/lecture">https://class.coursera.org/randomness-001/l
danmbox commented 10 years ago

Additional notes: This bug has persisted, so it's probably not due to the server. I have no problem saving a .../lecture/ page from the browser to an HTML file, so I don't understand why the script downloads gibberish. index.html gets similarly corrupted (about as often)

dgorissen commented 10 years ago

Thanks Dan. Unfortunately I do not have the time to really follow up on this, any help welcome. The way a browser interacts with a page is quite different that requests does. coursera-dl used to be based around the mechanize lib and that seemed more stable in this respect. You can try the mechanize branch on github (It should still work I believe) and see if that helps.

On Sat, Apr 5, 2014 at 6:07 PM, Dan Muresan notifications@github.comwrote:

Additional notes: This bug has persisted, so it's probably not due to the server. I have no problem saving a .../lecture/ page from the browser to an HTML file, so I don't understand why the script downloads gibberish. index.html gets similarly corrupted (about as often)

Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/issues/128#issuecomment-39644344 .