apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[core] Remove replace branch to reduce IO #3618

Closed JingsongLi closed 5 days ago

JingsongLi commented 5 days ago

Purpose

The replace_branch incurs an expensive IO overhead for most operations in the normal code path. For HDFS, it is a namenode access, and for object storage, it is a separate billing. And for object storage, every IO cost 10ms+. It is very costly.

This is difficult to accept, and if replace_branch is not as useful, we should remove this operation.

Detail is here: https://lists.apache.org/thread/q46clxx38fz7n1xw0sgscmcslo3qrp5c

Tests

API and Format

Documentation

schnappi17 commented 5 days ago

+1 for this.