apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.13k stars 844 forks source link

[Feature] Do not report statistics for append only table or predications without keys #2185

Open FangYongs opened 8 months ago

FangYongs commented 8 months ago

Search before asking

Motivation

Some olap queries in tpc-h will run faster when we perform them without statistics for append-only table or predications without keys. We can add an option for this to improve the olap latency. subtask of #1945

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

schnappi17 commented 8 months ago

@FangYongs Please assign it to me, I'll take it, thanks~

JingsongLi commented 8 months ago

Hi @FangYongs , do you have some performance benchmark?

FangYongs commented 8 months ago

@JingsongLi Yes. We are currently using TPC-H for performance benchmark of Flink and Paimon, and we found statistics on the append-only table can cause some query performance to regress. When we turn off statistics separately for these queries, the query latencies are lower. So we would like to add an option in Paimon to uniformly support this behavior.

JingsongLi commented 8 months ago

@FangYongs I see, Is there some benchmark result? How large is the performance regression.