apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

[Hive]To avoid excessive HMS memory usage, when executing AlterTable for a Paimon table containing a large number of fields #4549

Open GangYang-HX opened 2 days ago

GangYang-HX commented 2 days ago

Purpose

When metastore=hive is set, if the Paimon table has many fields (1600+), the MetaStoreUtils.updateTableStatsSlow() method will be called by default to update the table statistics, which will cause a sharp increase in HMS memory usage. In fact, Paimon has its own statistics and does not rely on Hive's statistics.

So, an option is provided to control whether to update Hive table statistics to avoid excessive HMS memory usage.

Linked issue: Issue-4507

API and Format

org.apache.paimon.hive.HiveCatalog#alterTableToHms

xuzifu666 commented 2 days ago

Please change your title 【Hive】to [Hive] and make your code style with format, you can execute 'mvn spotless:apply' for it

GangYang-HX commented 2 days ago

Please change your title 【Hive】to [Hive] and make your code style with format, you can execute 'mvn spotless:apply' for it

Thank you. All the tests can be passed locally, but there are still errors now. It seems that the errors have nothing to do with my changes.