delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Writing of UUID commits should not use put-if-absent semantics #3765

Closed sumeet-db closed 1 month ago

sumeet-db commented 1 month ago

Which Delta project/connector is this regarding?

Description

This PR fixes the coordinated commits utils to not write UUID-based commit files with put-if-absent semantics. This is not necessary because we assume that UUID-based commit files are globally unique so we will never have concurrent writers attempting to write the same commit file.

DynamoDBCommitCoordinator also now uses the utils for writing backfilled files.

How was this patch tested?

Existing tests are sufficient as this only affects how a commit is written in the underlying storage layer but does not change any logic in Delta Spark.

Does this PR introduce any user-facing changes?

No