delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.47k stars 1.68k forks source link

[Feature Request] [S3DynamoDBLogStore] S3 Conditional Writes #3596

Open AlJohri opened 1 month ago

AlJohri commented 1 month ago

Feature request

Now that S3 supports conditional writes, it would be great to enable multi-cluster writes without the need for S3DynamoDBLogStore.

Which Delta project/connector is this regarding?

Overview

For years, S3 has lacked a Put-If-Absent API. With the launch of S3 Conditional writes, this now changes. From the launch notes:

Amazon S3 adds support for conditional writes that can check for the existence of an object before creating it. This capability can help you more easily prevent applications from overwriting any existing objects when uploading data. You can perform conditional writes using PutObject or CompleteMultipartUpload API requests in both general purpose and directory buckets.

Using conditional writes, you can simplify how distributed applications with multiple clients concurrently update data in parallel across shared datasets. Each client can conditionally write objects, making sure that it does not overwrite any objects already written by another client. This means you no longer need to build any client-side consensus mechanisms to coordinate updates or use additional API requests to check for the presence of an object before uploading data. Instead, you can reliably offload such validations to S3, enabling better performance and efficiency for large-scale analytics, distributed machine learning, and other highly parallelized workloads. To use conditional writes, you can add the HTTP if-none-match conditional header along with PutObject and CompleteMultipartUpload API requests.

Motivation

This should simplify the use of delta table on S3 by enabling multi-cluster writers out of the box without any additional dependencies.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

AlJohri commented 9 hours ago

Related Issues: