Closed datashaman closed 8 years ago
Hmm, don't merge this in yet, I've been running this on my machine over the past few nights and there are a few bugs which must be fixed first.
I'll leave this pull request up for a bit so you can review the comments made.
Please let me know once you're done looking at them, and I'll close this pending some more testing.
I'm seeing pretty horrible performance in my overnight download sessions, so the chunking/ranging thing needs work.
Also, that off by one error has me worried that my recent downloads are subtly corrupted (although I've already got a solution).
I noticed the CRC32 (ignore my request for md5sum) incorporated into the file endpoint, I'll integrate that into the mix as well.
Have a look again, the downloads are working perfectly (mine all passed the CRC32 check). The speed seems acceptable to me, but I'm in Africa (our lines are slow). I'll let it loose on my files tonight, see what happens. Every other night so far it has failed with an IO error near the end (no doubt caused by the off-by-one error). Thankfully, that stopped it from continuing down the list. :)
Hello @datashaman. Thank for implementing this. I will review this and use it myself but I am quite busy right now so it could take a while to merge this into master.
Hi @datashaman. If you are interested, feel free to use the code I wrote in putiosync as a reference. Similarly, I download in chunks (in parallel) using the ranger header: https://github.com/posborne/putio-sync/blob/master/putiosync/multipart_downloader.py.
I should add that all of that code is MIT licensed.
Thanks @posborne I appreciate the assistance! Have a look at my most recent changes introducing the Retry class. I notice your code doesn't reference it, it's very useful in situations where the network is not stable.
http://urllib3.readthedocs.org/en/latest/helpers.html#urllib3.util.retry.Retry
And that reminds me I must put up a licence for putio-automator :)
Last night the downloader ploughed through everything in my queue, handling errors with grace and doing it quite fast as well. I think the pull request is ready for re-evaluation. I added urllib3 to the dependencies so you must re-run install.
Not entirely sure whether we should try to use the urlllib3 embedded in requests, or directly depend on it in this case. Your call.
Actually, considering that requests is very heavily based on urllib3, I'm going to change the code to use the embedded one instead. No need to add a direct dependency.
I think I'm going to split this work up into separate functional branches, too much going on in one pull request.
Thanks, that would be better.
File downloads now use ranged, chunked transfers. If a download fails partway through, the next time download method is called, it will resume from where it left off. The remote file is only deleted if the local file size matches the remote file size.
Downloads are flushed in broad ranges of 10MB by default, and each range is written in chunks of 8KB. Alter these depending on your network speed and reliability.
Both sizes are added as parameters to the download method.
I also added a transfers cancel API call, it was missing.