apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
874 stars 290 forks source link

[Improvement]: Avoid calling getMixedTablePartitionSpecById in the loop #3289

Open 7hong opened 1 month ago

7hong commented 1 month ago

Search before asking

What would you like to be improved?

In the Optimizing Plan phase, it is necessary to obtain the file's PartitionSpec. However, calling getMixedTablePartitionSpecById in a loop is a very expensive and unnecessary operation。

https://github.com/apache/amoro/blob/678b43c85347eb69d61e4f7ca016cb63d2ae56e4/amoro-ams/src/main/java/org/apache/amoro/server/optimizing/plan/OptimizingEvaluator.java#L119-L124

Especially in environments where Kerberos is enabled, repeatedly calling org.apache.iceberg.BaseTable#specs can incur lock overhead.

How should we improve?

I want to place the code for obtaining PartitionSpec inside the TableFileScanHelper interface。

...
  PartitionSpec partitionSpec = tableFileScanHelper.getSpec(fileScanResult.file().specId());
...

Are you willing to submit PR?

Subtasks

No response

Code of Conduct