dgorissen / coursera-dl

A script for downloading course material (video's, pdfs, quizzes, etc) from coursera.org
http://dirkgorissen.com/2012/09/07/coursera-dl-a-coursera-download-script/
GNU General Public License v3.0
1.73k stars 299 forks source link

Python 3 support, Progress & Speed Bar and replace `mechanize` with `requests` #100

Closed xu-cheng closed 10 years ago

xu-cheng commented 10 years ago

I found when using mechanize to connect, the program always ends up with frozen for no reason. Also mechanize currently doesn't support Python 3. Then I noticed pull #66 which suggests using requests. So I basically reproduce @luk51000's work. Here are the list of my enhancements:

BTW, I'm sorry that I used PEP Auto-format Plugin to format the code on post-save.

dgorissen commented 10 years ago

Thanks, a quick test with organanalysis-002 shows that it seems to work but I am getting a lot of Warnings and failures which I do not get if I switch to my current master. This is with python 2 on osx.

Any thoughts?

Course 1 of 2
* Collecting downloadable content from https://class.coursera.org/organalysis-002/lecture/index
* Got all downloadable content for organalysis-002
* organalysis-002 will be downloaded to /Users/dgorissen/git/coursera-dl/organalysis-002
 - Downloading lecture/syllabus pages
    - Downloading index.html
 Warning: Retrying to connect url:https://class.coursera.org/organalysis-002/class/index
    - Downloading lectures.html
Failed to download url https://class.coursera.org/organalysis-002/lecture/index to /Users/dgorissen/git/coursera-dl/organalysis-002/lectures.html: The read operation timed out
 - Week 1 - Introduction
  - Downloading resources for Welcome and Logistics
    - Downloading 1 - 1 - Welcome and Logistics.txt
 Warning: Retrying to connect url:https://class.coursera.org/organalysis-002/lecture/subtitles?q=427_en&format=txt
    - Downloading 1 - 1 - Welcome and Logistics.srt
 Warning: Retrying to connect url:https://class.coursera.org/organalysis-002/lecture/subtitles?q=427_en&format=srt
    - Downloading 1 - 1 - Welcome and Logistics.mp4
  - Downloading resources for Lecture 1 - Introduction to Organizations - Part 1 [With Face - 1020]
    - Downloading Week 1 Lectures_FINAL.ppt
    - Downloading 1 - 2 - Lecture 1 - Introduction to Organizations - Part 1 [With Face - 1020].txt
Failed to download url https://class.coursera.org/organalysis-002/lecture/subtitles?q=27_en&format=txt to /Users/dgorissen/git/coursera-dl/organalysis-002/01 - Week 1 - Introduction/02 - Lecture 1 - Introduction to Organizations - Part 1 [With Face - 1020]/1 - 2 - Lecture 1 - Introduction to Organizations - Part 1 [With Face - 1020].txt: The read operation timed out
    - Downloading 1 - 2 - Lecture 1 - Introduction to Organizations - Part 1 [With Face - 1020].srt
    - Downloading 1 - 2 - Lecture 1 - Introduction to Organizations - Part 1 [With Face - 1020].mp4
  - Downloading resources for Lecture 1 - Introduction to Organizations - Part 2 [Without Face - 617]
    - Downloading 1 - 3 - Lecture 1 - Introduction to Organizations - Part 2 [Without Face - 617].txt
 Warning: Retrying to connect url:https://class.coursera.org/organalysis-002/lecture/subtitles?q=28_en&format=txt
    - Downloading 1 - 3 - Lecture 1 - Introduction to Organizations - Part 2 [Without Face - 617].srt
 Warning: Retrying to connect url:https://class.coursera.org/organalysis-002/lecture/subtitles?q=28_en&format=srt
    - Downloading 1 - 3 - Lecture 1 - Introduction to Organizations - Part 2 [Without Face - 617].mp4
  - Downloading resources for Lecture 2 - Analytic Features of Organizations - Part 1 [With Face - 1606]
    - Downloading 1 - 4 - Lecture 2 - Analytic Features of Organizations - Part 1 [With Face - 1606].txt
 Warning: Retrying to connect url:https://class.coursera.org/organalysis-002/lecture/subtitles?q=39_en&format=txt
Failed to download url https://class.coursera.org/organalysis-002/lecture/subtitles?q=39_en&format=txt to /Users/dgorissen/git/coursera-dl/organalysis-002/01 - Week 1 - Introduction/04 - Lecture 2 - Analytic Features of Organizations - Part 1 [With Face - 1606]/1 - 4 - Lecture 2 - Analytic Features of Organizations - Part 1 [With Face - 1606].txt: The read operation timed out
    - Downloading 1 - 4 - Lecture 2 - Analytic Features of Organizations - Part 1 [With Face - 1606].srt
    - Downloading download.mp4
xu-cheng commented 10 years ago

About the warning part, I found the connection to coursera sometime was very fragile. In your current master, this problem turns out to be program frozen. So as what was done in pull #66, I make every connection retrying for three times if connect fails. And every time when it retries, a warning message appears. If needed, this message can be removed.

As for failures, I don't know where it came from. If it's the result of connection failing, there should be three warning messages ahead of it. I make a small change in commit 644bffddc47a85cd18e556be8549e0abb4cd62b8, it may have some improvement. If the problem is still there, can you use the patch below, so it can print the full traceback for debuging.

diff --git a/courseradownloader/courseradownloader.py b/courseradownloader/courseradownloader.py
index 20c4891..40fb918 100644
--- a/courseradownloader/courseradownloader.py
+++ b/courseradownloader/courseradownloader.py
@@ -367,6 +367,8 @@ class CourseraDownloader(object):
                 sys.stdout.flush()
         except Exception as e:
             print_("Failed to download url %s to %s: %s" % (url, filepath, e))
+            import traceback
+            traceback.print_exc()

     def download_about(self, cname, course_dir):
         """
dgorissen commented 10 years ago

I would like to merge this but more testing shows that, while the retries help I quite commonly get files of 0 bytes (particularly subtitle files). Occasionally video files will also timeout and fail. Interestingly if you just run it again it usually updates the broken files correctly. Quickly tried adding some sleeps between retries, but no difference. With mechanize it works flawlessly (and faster as well it seems) but unfortunately I dont have the time to really dig into why.

xu-cheng commented 10 years ago

OK, I will make more test and try to improve the robustness.

xu-cheng commented 10 years ago

In addition, strangely, mechanize doesn't work flawlessly in my case. The program frozenning appears repeatedly. That's the main reason I try to use requests to replace it.

dgorissen commented 10 years ago

mm very odd..


Web / Blog : http://dirkgorissen.com Twitter : https://twitter.com/elazungu

On Sat, Oct 5, 2013 at 4:27 PM, Xu Cheng notifications@github.com wrote:

In addition, strangely, mechanize doesn't work flawlessly in my case. The program frozenning appears repeatedly. That's the main reason I try to use requests to replace it.

— Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/pull/100#issuecomment-25750543 .

lsoliveira459 commented 10 years ago

This is the same problem I had and appearently the same solution. I only stopped because classes didn't allow me. But tell me if you need help, Xu Cheng.

Lucas Oliveira

On Sat, Oct 5, 2013 at 12:30 PM, Dirk Gorissen notifications@github.comwrote:

mm very odd..


Web / Blog : http://dirkgorissen.com Twitter : https://twitter.com/elazungu

On Sat, Oct 5, 2013 at 4:27 PM, Xu Cheng notifications@github.com wrote:

In addition, strangely, mechanize doesn't work flawlessly in my case. The program frozenning appears repeatedly. That's the main reason I try to use requests to replace it.

— Reply to this email directly or view it on GitHub< https://github.com/dgorissen/coursera-dl/pull/100#issuecomment-25750543> .

— Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/pull/100#issuecomment-25750602 .

xu-cheng commented 10 years ago

OK I have a problem, my version code is working so fine on my PC that I cannot reproduce the error to debug with.

dgorissen commented 10 years ago

Thats annoying. Could you give some details on your environment? Operating system, python version, network setup, ... What are your results on a linux machine with python 2.7?

xu-cheng commented 10 years ago

My OS is Windows 7 x64. And the python version is 2.7.5 and 3.3.2. Normal network setting. Nothing extraordinary.

xu-cheng commented 10 years ago

This commit(9d957a387120aa3c776e9e43623a3c31407d1801) may fix the 0 byte issue when downloads srt files.

dgorissen commented 10 years ago

Will have another try.

To debug further, could you set the following on the mechanize browser object and paste the relevant output where you get problems (taken from http://wwwsearch.sourceforge.net/mechanize/):

# Log information about HTTP redirects and Refreshes.
br.set_debug_redirects(True)
# Log HTTP response bodies (ie. the HTML, most of the time).
br.set_debug_responses(True)
# Print HTTP headers.
br.set_debug_http(True)

# To make sure you're seeing all debug output:
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.INFO)
xu-cheng commented 10 years ago

Here's a log when using mechanize, I see nothing wrong. But the program was just frozened for almost 10 minutes with no network or any other IO activities before I kill it. Should there be some timeout?

Course 1 of 1
send: 'GET /interorg-001/auth/auth_redirector?type=login&subtype=normal HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 302 Moved Temporarily\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Type: text/html
header: Date: Wed, 09 Oct 2013 08:35:25 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Location: https://class.coursera.org/interorg-001/class
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:24 GMT; path=/interorg-001
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: Content-Length: 0
header: Connection: Close
send: 'GET /interorg-001/class HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Type: text/html; charset=utf-8
header: Date: Wed, 09 Oct 2013 08:35:26 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:26 GMT; path=/interorg-001
header: Vary: Accept-Encoding
header: Vary: Accept-Encoding
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: X-UA-Compatible: IE=Edge
header: transfer-encoding: chunked
header: Connection: Close
* Collecting downloadable content from https://class.coursera.org/interorg-001/lecture/index
send: 'GET /interorg-001/lecture/index HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Type: text/html; charset=utf-8
header: Date: Wed, 09 Oct 2013 08:35:27 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:27 GMT; path=/interorg-001
header: Set-Cookie: serve_netease_970567=1; expires=Thu, 10-Oct-2013 08:35:27 GMT
header: Vary: Accept-Encoding
header: Vary: Accept-Encoding
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: X-UA-Compatible: IE=Edge
header: transfer-encoding: chunked
header: Connection: Close
* Got all downloadable content for interorg-001
* interorg-001 will be downloaded to C:\Coding\Python\workspace\coursera-test\interorg-001
 - Downloading lecture/syllabus pages
 send: 'GET /interorg-001/class/index HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Type: text/html; charset=utf-8
header: Date: Wed, 09 Oct 2013 08:35:30 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:29 GMT; path=/interorg-001
header: Vary: Accept-Encoding
header: Vary: Accept-Encoding
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: X-UA-Compatible: IE=Edge
header: transfer-encoding: chunked
header: Connection: Close
send: 'GET /interorg-001/class/index HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Type: text/html; charset=utf-8
header: Date: Wed, 09 Oct 2013 08:35:31 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:31 GMT; path=/interorg-001
header: Vary: Accept-Encoding
header: Vary: Accept-Encoding
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: X-UA-Compatible: IE=Edge
header: transfer-encoding: chunked
header: Connection: Close
send: 'GET /interorg-001/lecture/index HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: serve_netease_970567=1; csrf_token=TpEI9sYBEsRjpcLmx5IN; maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Type: text/html; charset=utf-8
header: Date: Wed, 09 Oct 2013 08:35:33 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:32 GMT; path=/interorg-001
header: Vary: Accept-Encoding
header: Vary: Accept-Encoding
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: X-UA-Compatible: IE=Edge
header: transfer-encoding: chunked
header: Connection: Close
send: 'GET /interorg-001/lecture/index HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: serve_netease_970567=1; csrf_token=TpEI9sYBEsRjpcLmx5IN; maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Type: text/html; charset=utf-8
header: Date: Wed, 09 Oct 2013 08:35:34 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:34 GMT; path=/interorg-001
header: Vary: Accept-Encoding
header: Vary: Accept-Encoding
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: X-UA-Compatible: IE=Edge
header: transfer-encoding: chunked
header: Connection: Close
send: 'GET /maestro/api/topic/information?topic-id=interorg HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.coursera.org\r\nCookie: maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-cache, no-store, must-revalidate
header: Content-Type: application/json
header: Date: Wed, 09 Oct 2013 08:35:36 GMT
header: Server: nginx/1.2.8
header: Vary: Accept-Encoding
header: transfer-encoding: chunked
header: Connection: Close
 - Introduction
  - Downloading resources for General introduction by Gilbert Probst (5-53)
  send: u'GET /interorg-001/lecture/subtitles?q=23_en&format=txt HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: class.coursera.org\r\nCookie: serve_netease_970567=1; csrf_token=TpEI9sYBEsRjpcLmx5IN; maestro_login_flag=1; CAUTH=gp4ohiuBKMUTEK963Z0_eDFNN6Hv2xKbDRefk4OtFWL00Q7lMVU-mMXLnwevHITvrwrV7kEHar3v1RUyoo21hw.0BnSXJTJeQKdZelN3W-X6g.7LYMbBHiiYGYDDiF4LWDcetoFbG_oeRvwOppH2l1Y-q4B-pYTe6_I7GcG68augUpNFSJ_1dfq-iMRGMZyU_iTT31q2xEReuipRc-YhUdcU_pRMM6vbzeM1FCoxtgIlJjPNHt3njK1oFKDV8LPn-AJUobB8x91x0TfNs35YKVQFLHwRxGRqU_Z1fzNsQt17as\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
header: Content-Disposition: attachment; filename="6%20-%201%20-%20General%20introduction%20by%20Gilbert%20Probst%20%285%3A53%29.txt"; filename*=UTF-8''6%20-%201%20-%20General%20introduction%20by%20Gilbert%20Probst%20%285%3A53%29.txt
header: Content-Type: application/force-download
header: Date: Wed, 09 Oct 2013 08:35:38 GMT
header: Expires: Thu, 19 Nov 1981 08:52:00 GMT
header: Pragma: no-cache
header: Server: nginx/1.2.8
header: Set-Cookie: csrf_token=TpEI9sYBEsRjpcLmx5IN; expires=Fri, 08-Nov-2013 08:35:38 GMT; path=/interorg-001
header: X-Powered-By: PHP/5.3.10-1ubuntu3.6
header: transfer-encoding: chunked
header: Connection: Close

^C
dgorissen commented 10 years ago

After upgrading my OSX machine I had another try with this. Seems less of a problem though I still get quite a few retries and a couple of timeouts. However, interestingly it seems to work quite smooth and quick on an ubuntu vm guest on the osx host. Same python versions.

I would really like to merge this and will probably do so. Ignoring the osx issues for now. Could you update the patch with the latest changes (should be small)?

lsoliveira459 commented 10 years ago

Have you considered a platform dependent solution? Using your current code for OSX, and possibly all *nix, and requests for Windows?

dgorissen commented 10 years ago

Yeah, I had a quick look but it seems ugly and hacky. Better would be to get to the bottom of the osx problem but I currently have no time to go down that rabbit hole. I would like to get other people to test this to see how widespread the problem is. Merging with master would help with that and we can see from there. Any suggestions welcome.

On Wed, Oct 30, 2013 at 10:57 PM, Lucas Oliveira notifications@github.comwrote:

Have you considered a platform dependent solution? Using your current code for OSX, and possibly all *nix, and requests for Windows?

— Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/pull/100#issuecomment-27447211 .

xu-cheng commented 10 years ago

I have updated the patch with the latest change.

xu-cheng commented 10 years ago

In case the retry warning scares other people, I commented it.

dgorissen commented 10 years ago

Merged. Lets see who else complains :) The mechanize based version can still be found under the mechanize branch.

dgorissen commented 10 years ago

@xu-cheng I notice from time to time I get corrupt files when downloading, have committed a small patch to at least try to detect this a bit better on re-downloading a course. However, there are still cases where files dont download correctly during the first pass and require a second pass.

I also sometimes get output like this:

cd

Have you had similar issues? Been testing on osx mavericks python 2.7 with stats1-002

xu-cheng commented 10 years ago

Yeah sometimes I had similar issues. In my experience, the connection to coursera.org sometimes can be very fragile. And mostly connection failing happens in two situations:

dgorissen commented 10 years ago

Mmm. Never had login problems or ssl errors. Is a workaround to do two requests then? One for the header then one for the full file?

xu-cheng commented 10 years ago

It is two requests in current code. This situation is due to the request to get the header don't read all the content and the connection wouldn't be released. And somehow it interference the whole connection pool.

Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!

Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object.

http://docs.python-requests.org/en/latest/user/advanced/#keep-alive

xu-cheng commented 10 years ago

Maybe this will help.

dgorissen commented 10 years ago

Ok, dont have much time to look at this but will try to do later this week. Feel free to have a poke yourself and submit a patch.