apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

Support native S3 conditional writes #6682

Closed benesch closed 2 weeks ago

benesch commented 2 weeks ago

Add support for PutMode::Create and copy_if_not_exists on native AWS S3, which uses the underlying conditional write primitive that Amazon launched earlier this year 0.

The conditional write primitive is simpler than what's available in other S3-like products (e.g., R2), so new modes for s3_copy_if_not_exists and s3_conditional_put are added to select the native S3-specific behavior.

To maintain strict backwards compatibility (e.g. with older versions of LocalStack), the new behavior is not on by default. It must be explicitly requested by the end user.

The implementation for PutMode::Create is straightforward. The implementation of copy_if_not_exists is a bit more involved, as it requires managing a multipart upload that uses the UploadPartCopy operation, which was not previously supported by this crate's S3 client.

To ensure test coverage, the object store workflow now runs the AWS integration tests with conditional put both disabled and enabled.

Which issue does this PR close?

Fix #6285.

benesch commented 2 weeks ago

@tustvold are you the right person to review this? I saw you implemented quite a bit of the previous conditional put/get support.

benesch commented 2 weeks ago

To maintain strict backwards compatibility (e.g. with older versions of LocalStack), the new behavior is not on by default. It must be explicitly requested by the end user.

This isn't ideal, but it seemed best for the short term to avoid breaking backcompat for anyone who might be using a version of S3 that doesn't support CAS. In the long term, I think the ideal would definitely be to have the native S3 CAS support enabled by default (i.e. unless overridden explicitly by the user).

benesch commented 2 weeks ago

@tustvold if you can give me another re-run here I think we’ll be all green on CI. I fixed a minor clippy failure and upgraded to a version of localstack that supports conditional put.

benesch commented 2 weeks ago

Whew, all tests here are green! @tustvold let me know if there's anything else here you'd like to see, but from my perspective this is ready to (squash) merge.

criccomini commented 1 week ago

Amazing. Any idea when this might go out 🔥

tustvold commented 1 week ago

https://github.com/apache/arrow-rs/issues/6596 tracks the next release

benesch commented 1 week ago

Thanks very much for the review and merge, @tustvold!