apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 958 forks source link

[Feature] Do not report statistics for append only table or predications without keys #2185

Open FangYongs opened 1 year ago

FangYongs commented 1 year ago

Search before asking

Motivation

Some olap queries in tpc-h will run faster when we perform them without statistics for append-only table or predications without keys. We can add an option for this to improve the olap latency. subtask of #1945

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

schnappi17 commented 1 year ago

@FangYongs Please assign it to me, I'll take it, thanks~

JingsongLi commented 1 year ago

Hi @FangYongs , do you have some performance benchmark?

FangYongs commented 1 year ago

@JingsongLi Yes. We are currently using TPC-H for performance benchmark of Flink and Paimon, and we found statistics on the append-only table can cause some query performance to regress. When we turn off statistics separately for these queries, the query latencies are lower. So we would like to add an option in Paimon to uniformly support this behavior.

JingsongLi commented 1 year ago

@FangYongs I see, Is there some benchmark result? How large is the performance regression.