Issue:
The S3 client can deadlock if too many uploads are "stalled" (not providing data when the S3 client asks for it).
Mountpoint (which wraps aws-c-s3 with a filesystem-like API) had a user that opened 100+ files at once. The user wrote data to some of the later files they opened, and waited for those writes to complete. But aws-c-s3 was waiting on data from the first few files. Both sides were waiting on each other. It was a deadlock.
Description of changes:
Add async aws_s3_meta_request_write() function.
Mountpoint should use this, instead of the async-streams. Async-streams were still "pull based", where the S3 client decided when data should be provided. The new write() function is push-based. Mountpoint can send data whenever. "Stalled" uploads will not deadlock the S3 client.
This workaround didn't help all that much. The S3 client still deadlocked at the same value of "N stalled uploads" if your workloads also mixed in other types of non-upload meta requests.
Design:
The memory-ownership for this API evolved over time. Here were the stages:
1) At first, the user was required to keep memory valid until the async-write completed. (as of commit 5614546)
PRO: No locks held while memcpy'ing data into a part. Good memory usage, only copy the data when it's ready to send.
CON: no way for user to drop memory without canceling the meta-request and blocking their thread until the write completed, which risks deadlock if blocking on a thread the meta-request needs to use.
2) Next, we had write() copy all memory immediately (as of commit 5e4dd89)
PRO: user does not need to keep memory alive after write() call
CON: a huge write immediately forces a huge allocation. Writing 10GiB means at least 20GiB total memory usage during the write() call. Users would end up needing to chunk up large work themselves, rather than let aws-c-s3 handle it.
3) Finally, go back to approach 1, but guarantee that canceling a meta-request will immediately complete any pending writes.
PRO: Back to good memory usage
CON: Need to hold a lock when memcpy'ing data into a part.
Some quick benchmarking showed similar performance from all 3 approaches (tested 30GiB upload, and simultaneous upload of 100 5GiB objects, calling write in 256KiB chunks), so 🤷♀️. Going with approach #3 since it doesn't risk deadlock, and doesn't risk enormous memory usage.
TODO:
Improve buffering of small writes (use buffer pool?)
Get async-write working with Content-Length (currently only works when Content-Length omitted)
Get async-write working with DEFAULT meta requests (currently only works with PUT)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue: The S3 client can deadlock if too many uploads are "stalled" (not providing data when the S3 client asks for it).
Mountpoint (which wraps aws-c-s3 with a filesystem-like API) had a user that opened 100+ files at once. The user wrote data to some of the later files they opened, and waited for those writes to complete. But aws-c-s3 was waiting on data from the first few files. Both sides were waiting on each other. It was a deadlock.
Description of changes:
aws_s3_meta_request_write()
function.Design: The memory-ownership for this API evolved over time. Here were the stages: 1) At first, the user was required to keep memory valid until the async-write completed. (as of commit 5614546)
Some quick benchmarking showed similar performance from all 3 approaches (tested 30GiB upload, and simultaneous upload of 100 5GiB objects, calling write in 256KiB chunks), so 🤷♀️. Going with approach #3 since it doesn't risk deadlock, and doesn't risk enormous memory usage.
TODO:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.