Implement optional concurrent "Range" requests (refs #86)

justinfx commented 1 year ago

This is an implementation of an optional feature to have a Request download the payload in multiple chunks, using a "Range" request, if supported by the server.

API updates

The Request struct gains a new field called RangeRequestMax, which when set to > 0 controls how many chunks to download in parallel using a "Range" request, instead of a single request reading the full body.

Implementation details

High level steps:

RangeRequestMax > 0
Ensure that a head request is always performed, so we can verify the support for "Range" and get the content length
Don't start the synchronous GET request in the state machine, as it will be executed in parallel later (HEAD request is the last request performed)
Transition to openWriter
Use an alternate transfer implementation, called transferRanges
async copyFile works the same as before

Given the way the state machine works, it seemed easier to launch the concurrent range requests during the copy phase, instead of in the synchronous getRequest state and have to monitor a list of requests.

A new transferer interface has been introduced, to have a second implementation called transferRanges

`transferRanges` implementation

This alternate implementation handles launching the concurrent range requests, doing the copy of the data, and tracking the metrics.

It seemed easier to just pass the HEAD Response to the private constructor, as most of the needed details are present on that struct.

transferRanges.Copy() will start a number of Range requests with an offset-limit in goroutines, per the value of RangeRequestMax passed in. Each goroutines writes directly to the open file writer using WriteAt (with underlying pwrite() syscall) to write chunks of data at offsets to the same file descriptor in parallel. Metrics are atomically updated to keep N() and BPS() working.

Other details

I did also update the default setting in the Client when it comes to setting "TCP_NODELAY". In Go standard lib, this is set to true in the net.TCPConn to target rpc workloads. But it would make more sense, in theory, to disable Nagles Algorithm by default in a library specific to downloading files over HTTP. It can still be controlled by the user supplying a custom HTTPClient

justinfx commented 1 year ago

I've realised that I have left the returned Response contains an HTTPResponse as the one from the HEAD request, when performing a "Range" request. Would be interested in feedback on this approach, since I wasn't sure of a better response to return, seeing as the parallel range requests are running transparently and being written to the target file. Another option would be that I clone the HEAD http response and modify it to represent the method that matches the Range requests (GET, ...).

ananthb commented 1 year ago

If multiple goroutines are writing to the output in parallel, will the output file have gaps in between partially written chunks? If so, how would resuming a partial download work?

justinfx commented 1 year ago

@ananthb I think it is possible in the current implementation for the resume to not be correct, given a situation where a later range concurrently finishes sooner than an earlier range and then the transfer is stopped. The reason would be that as each concurrent range completes writing, it atomically adds to the total bytes written. So if 10,20,40 finish, but 30 does not, it would report 30 total written but there would be a gap, and the file itself would look like it had 40 30 bytes written since it is sparse. Maybe concurrent range failures need to truncate back to before the earliest failed range?

justinfx commented 1 year ago

@ananthb I've just pushed 8b4f8d2 to address the support for resume with ranged requests. It will now truncate the file to the end of the lowest successful range offset before a failure to avoid any gaps. This means you may lose progress on some chunks that concurrently finished just after the failed range. I've updated the existing AutoResume test to also check resuming ranged requests that did not fail with a gap. And then an extra test that sets up a failure with a gap and asserts that it truncates (which would be a case that then would function as expected with a normal Resume).

ananthb commented 1 year ago

Yep that makes sense @justinfx.

ananthb commented 1 year ago

What happens if grab crashes before it can truncate the file?

If you only wrote completed chunks to the file, then you wouldn't need to truncate it in the first place. Either by buffering incomplete chunks in-memory or on disk tmpfiles.

justinfx commented 1 year ago

@ananthb yea if the process crashed before it could truncate, it would leave a file that could be corrupt and then used for resume. I definitely don't think we should buffer in memory because that could have surprising resource usage implications on large files. The goal of using WriteTo was to be optimal in not having to buffer anything and only writing a single file without any post-process i/o. I admit that I wrote this feature based on my own use-case where I don't use the resume feature at all, so I didn't think through the edge cases of supporting it. Couple of ideas in terms of writing to files:

Write to a ".partial" and rename to "" after chunks finish. This could leave being a ".partial" file on crash, which would always be overwritten from the start on next resume.
On posix OS, unlink the name of the of the target and only write to the file descriptor. Then hard link / linkat the fd to a named file after chunks finish. This would avoid leaving behind a ".partial" file on crash since there is no reference anymore.

ananthb commented 1 year ago

Yeah in-memory buffers alone won't be enough to cover, say large chunk sizes or many parallel chunks. I've been toying with the idea of an in-memory buffer that spills over onto disk.

My basic idea to make resume work is that the output file should always be consistent and not have any "holes". Basically write only completed chunks in order to the output file.

I could use anonymous chunks to buffer in-progress chunks and too.

justinfx commented 1 year ago

The current implementation splits large chunks by the number of parallel workers, so I wonder if your idea could manage to avoid large memory usage. You might have to buffer alot before a gap closed. Maybe the single temp file that only moves on success or truncation is eaiser? But I'm interested to see what your approach produces!

ananthb commented 4 weeks ago

@justinfx I wrote a library called chonker that does Range GETs transparently. You can use it with grab like this: https://go.dev/play/p/LOlBnp24bXp.

justinfx commented 4 weeks ago

@ananthb nice one. It's been a while and I have moved on from this issue, having made use of it in a fork with this feature

cavaliergopher / grab