apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
874 stars 290 forks source link

[ARORO-3289] Avoid calling getMixedTablePartitionSpecById in the scan loop #3290

Open 7hong opened 1 month ago

7hong commented 1 month ago

Why are the changes needed?

Close #3289 .

How was this patch tested?

In my environment, there is a table with many partitions waiting to be merged. Each plan is very time-consuming. After optimization, the time-consuming is significantly reduced

image

7hong commented 4 weeks ago

@zhoujinsong @majin1102 Do you have time to review it? Thanks

baiyangtx commented 2 weeks ago

image

Will the old code trigger a reload of TableMetadata?

It seems that if the reloading of TableMetadata is not triggered, the performance of the new code is the same as that of the old code.

Iceberg TableMetadata also returns Specs Map objects from memory

image image

7hong commented 2 weeks ago

@baiyangtx Yes, the old code will refresh the TableMetadata. Especially when calling the getUGI method in an environment with Kerberos enabled, it will enter a synchronous blocking state.