apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 955 forks source link

[core] Support delete stats in result of scan plan. #4506

Closed wwj6591812 closed 3 days ago

wwj6591812 commented 1 week ago

Purpose

In my company's production environment, when use Flink session cluster for OLAP scan Paimon, we found the JobManager's memory is always heavy. So, we will optimize this by two ways: (1) Delete stats in DataSplit. (2) When dataSkipping, cut unused stats in ManifestEntry.

This pr is for (1)

Linked issue: close #xxx

Tests

API and Format

Documentation

wwj6591812 commented 6 days ago

@JingsongLi Hi,Please CC, Thx.

wwj6591812 commented 4 days ago

@JingsongLi had addressed, Thanks for review.