apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.16k stars 855 forks source link

[core] Close writer after commit identifier of snapshot is strictly larger than last modified identifier #3499

Closed tsreaper closed 1 month ago

tsreaper commented 1 month ago

Purpose

Currently in AbstractFileStoreWrite, we use writerContainer.lastModifiedCommitIdentifier <= latestCommittedIdentifier to check that there is no more record waiting to be committed for this writer.

However, each commit identifier may have multiple snapshots (for example, one APPEND snapshot and one COMPACT snapshot). If the commit is slow, writer might be closed and re-opened between the APPEND snapshot and the COMPACT snapshot, thus causing commit conflicts in the future.

To ensure that all snapshots of this identifier are committed, the correct condition should be writerContainer.lastModifiedCommitIdentifier < latestCommittedIdentifier.

Tests

API and Format

Not affected.

Documentation

No new feature.