Add transfers cancel and use resume for file downloads

cenkalti / putio.py

A python wrapper for put.io APIv2

http://put.io

MIT License

72 stars 41 forks source link

Add transfers cancel and use resume for file downloads #13

Closed datashaman closed 8 years ago

datashaman commented 8 years ago

File downloads now use ranged, chunked transfers. If a download fails partway through, the next time download method is called, it will resume from where it left off. The remote file is only deleted if the local file size matches the remote file size.

Downloads are flushed in broad ranges of 10MB by default, and each range is written in chunks of 8KB. Alter these depending on your network speed and reliability.

Both sizes are added as parameters to the download method.

I also added a transfers cancel API call, it was missing.

datashaman commented 8 years ago

Hmm, don't merge this in yet, I've been running this on my machine over the past few nights and there are a few bugs which must be fixed first.

datashaman commented 8 years ago

I'll leave this pull request up for a bit so you can review the comments made.

Please let me know once you're done looking at them, and I'll close this pending some more testing.

I'm seeing pretty horrible performance in my overnight download sessions, so the chunking/ranging thing needs work.

Also, that off by one error has me worried that my recent downloads are subtly corrupted (although I've already got a solution).

I noticed the CRC32 (ignore my request for md5sum) incorporated into the file endpoint, I'll integrate that into the mix as well.

datashaman commented 8 years ago

Have a look again, the downloads are working perfectly (mine all passed the CRC32 check). The speed seems acceptable to me, but I'm in Africa (our lines are slow). I'll let it loose on my files tonight, see what happens. Every other night so far it has failed with an IO error near the end (no doubt caused by the off-by-one error). Thankfully, that stopped it from continuing down the list. :)

cenkalti commented 8 years ago

Hello @datashaman. Thank for implementing this. I will review this and use it myself but I am quite busy right now so it could take a while to merge this into master.

posborne commented 8 years ago

Hi @datashaman. If you are interested, feel free to use the code I wrote in putiosync as a reference. Similarly, I download in chunks (in parallel) using the ranger header: https://github.com/posborne/putio-sync/blob/master/putiosync/multipart_downloader.py.

I should add that all of that code is MIT licensed.

datashaman commented 8 years ago

Thanks @posborne I appreciate the assistance! Have a look at my most recent changes introducing the Retry class. I notice your code doesn't reference it, it's very useful in situations where the network is not stable.

http://urllib3.readthedocs.org/en/latest/helpers.html#urllib3.util.retry.Retry

datashaman commented 8 years ago

And that reminds me I must put up a licence for putio-automator :)

datashaman commented 8 years ago

Last night the downloader ploughed through everything in my queue, handling errors with grace and doing it quite fast as well. I think the pull request is ready for re-evaluation. I added urllib3 to the dependencies so you must re-run install.

Not entirely sure whether we should try to use the urlllib3 embedded in requests, or directly depend on it in this case. Your call.

datashaman commented 8 years ago

Actually, considering that requests is very heavily based on urllib3, I'm going to change the code to use the embedded one instead. No need to add a direct dependency.

datashaman commented 8 years ago

I think I'm going to split this work up into separate functional branches, too much going on in one pull request.

cenkalti commented 8 years ago

Thanks, that would be better.