apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 955 forks source link

[Feature] Supports specifying hive environment parameters when altering table #4507

Open GangYang-HX opened 1 week ago

GangYang-HX commented 1 week ago

Search before asking

Motivation

When using the Paimon Schema Evolution function, the alertTable method of Hive will be called to add fields, and the statistical information of the Hive table will be updated by default. In fact, Paimon will use its own statistical information instead of Hive's, so it can be turned off in scenarios where there are many fields in the Paimon table and they are updated frequently to avoid a sharp increase in HMS memory. image

Solution

When alterTable is triggered, the method called is changed to alter_table_with_environmentContext instead of alter_table. In this way, you can control whether to update table stats by setting the DO_NOT_UPDATE_STATS parameter in the EnvironmentContext object

Anything else?

No response

Are you willing to submit a PR?