apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

[core][format] Optimize manifest reading performance,add pushdown for manifest and orc. #4497

Open ranxianglei opened 1 week ago

ranxianglei commented 1 week ago

Purpose

English:Optimize the manifest reading performance, optimize the format object creation performance, and reduce the total time spent on the actual test manifest to less than 3ms (of course there is room for optimization to reduce it to less than 1ms). With the orc push-down function turned on, the metadata format is changed to orc, which can handle high-concurrency (qps greater than 10,000) and low-latency (overall rt less than 50ms) scenarios.

Chinese:优化manifest读取性能,优化format对象创建性能,实际测试manifest总耗时降低到3ms以下(当然还有优化空间降低到1ms以下)。配合元数据缓存开启,orc下推功能开启,元数据格式改成orc,可以承接高并发(qps 大于1万)低延迟(整体rt 50ms以下)场景

Linked issue: close #xxx

Tests

API and Format

Documentation

ranxianglei commented 1 week ago

with https://github.com/apache/paimon/pull/4231 together

ranxianglei commented 3 days ago

Note: Since the cache code related to manifest and fileformat has been withdrawn in this PR and will be submitted to the next PR, this PR cannot yet achieve the performance introduced by Purpose.

注意:由于manifest和fileformat相关的cache代码本pr已经撤回,留到下一个pr提交,本pr尚不能达到Purpose介绍的性能。