Open phs opened 11 months ago
Hi @phs thanks for reaching out. I brought up your feature request for discussion with the team, and one suggestion they had was to try setting your preferred_transfer_client
to crt
, and adjusting the target_bandwidth
as documented here: https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html. You would need to install v2 of the CLI for access to these features.
(As noted in the documentation these features are currently considered experimental, but could be worth trying for your use case.)
Please let us know if that improves the performance, and if there are any more data points you can share on the transfer speed.
Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.
Hello, yes. I'm excited to try the idea, I should be able to get to it today or tomorrow.
So the great news is my large objects are now downloading to /dev/null
at roughly 700 MiB/s
!
The bad news is that appears to be the case regardless of the target_bandwidth
setting; my 1600MB/s
is apparently being ignored.
My chunk size is currently still 128 MiB, meaning my largest file has roughly 50 chunks. I'm going to see what happens if I drop it back down to 32 MiB. EDIT: With 32 MiB chunks, we get the same result. But that's still quite good!
Even at 700 MiB/s, in practice I'm struggling to achieve that write throughput to ensure disk is not the bottleneck (my aggregate throughput to EBS combined with instance storage caps out around 650 MiB/s) so I think we're good here.
I do have one piece of feedback for the team responsible for the crt
client. One of its requirements is to read/write a file in a filesystem, rather than a pipe. This requirement makes sense, since presumably the client is downloading chunks out of order and using something like mmap
to write them directly to their target ranges when they arrive.
Aside from technical hurdles in implementation, I can imagine it's not clear to them that the user may want to keep all those chunks in memory (one risks blowing out ram.) Since the sizes involved in my use case permit it, and disk write throughput is precious, I do definitely want to write and hold all those chunks to ram.
That's easy for me to do; I can use a tmpfs
(perhaps with a -o size=limit
option) to hold downloaded files, and hand them off to e.g. zstd | tar
once they finish. The problem is I then need to wait for the downloads to finish before I can get at my data. My download+unpack process, which is now down to just under a minute (thank you!) could probably drop another 30% if I didn't have to wait for that first download to completely finish before starting the decompress. I can and will twiddle the count and sizes of my downloaded files to help pipeline that, but that is getting rather fiddly.
The ask for the crt
team is to instead offer an option to hold downloaded chunks in memory (perhaps up to some configured limit) so they can once again stream chunks out to a pipe (in order, once they're available.)
Describe the feature
Please give us a way to track and aggressively retry slower operations when using concurrent range requests to download single large objects ("multi-part download"), in the manner advised by the S3 user guide.
Use Case
In my EC2 instance, I'd like to use awscli to quickly fetch a small number (4) of objects (about 10 GiB in total size) from a single S3 bucket in the same region.
Ideally I'd like to saturate the instance's inbound network bandwidth (bursting up to 12.5 Gb/s) to get the job done in a few seconds, however a minute would do. This latency is on a critical path for bootstrapping the instance in an auto scaling scenario; other options for getting data onto the instance have been ruled out for independent reasons.
My objects have been uploaded using multipart upload and I've experimented with setting threshold and chunk size to 16, 32, 64 or 128 MiB. On download, I set the same parameters as well as max concurrency to values like 16, 32 or 64. On download I'm connecting to a regional endpoint.
What I find is my download proceeds quickly, typically reaching speeds between 150 to 250 MiB/s. That's good, but it's still nowhere near the (1600 MiB/s) instance burst bandwidth limit. The process is not limited on instance network throughput. Downloading to
/dev/null
produces the same result to also rule out e.g. disk write throughput.The bottleneck appears to be either in the S3 client, or upstream in the service. On repeated attempts in a loop we do in fact see improvements, as my object chunks make their way into hotter caches.
Looking for ideas I went to the page linked above, and realized I had not yet considered aggressively retrying laggard requests as it suggests. If I watch the progress meter on download, it does indeed begin strong, and deteriorates over time as the client runs out of chunks to fetch while waiting for the slow ones. I suspect eagerly retrying slow connections might recoup 10-15% of the latency in my scenario. Since I don't seriously expect to ultimately saturate my instance's network, this would still be an interesting win.
Looking in the documentation for
awscli
, and ultimately at the source code forbotocore
ands3transfer
, I could not find where I might set a "chunk request timeout" or the percent of concurrent requests to retry.Proposed Solution
The policy mentioned on the page seems reasonable to me:
Defining "slowest" might be tricky, but I'm interested in the multipart upload/"download" scenario where all chunks have the same, known size. Projected chunk download time perhaps?
How the policy above is expressed in configuration doesn't worry me particularly, so long as it can be quickly dropped into the config file like other tuning parameters. If this behavior appeared but the parameters were hard-wired, that would probably also be fine.
Other Information
No response
Acknowledgements
CLI version used
aws-cli/1.29.75 Python/3.10.12 Linux/6.5.4-76060504-generic botocore/1.31.75
Environment details (OS name and version, etc.)
Linux