apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
873 stars 290 forks source link

[Improvement]: Improve the efficiency for RESTAPI of optimizing-processes #3066

Closed klion26 closed 1 month ago

klion26 commented 3 months ago

Search before asking

What would you like to be improved?

as the title saied the endpoint is catalogs/{catalog}/dbs/{db}/tables/{table}/optimizing-processes

How should we improve?

backgroud and maybe the solution can read the doc

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

klion26 commented 1 month ago
  1. remove the ServerCatalog#tableExists(db, table) in TableController#getOptimizingProcesses because we can reuse the table load logic in the following logic
  2. push the offset and limit in mapper#selectOptimizingProcesses(MixedAndIcebergTableDescriptor#getOptimizingProcessesInfo) to db (this can limit the items retrieved from db)
  3. add format info when calling this rest API, so that MixedAndIceberg can avoid loading the table from external catalog(this is to be confirmed)
klion26 commented 1 month ago

The initial idea is to implement 1 and 2 above first, and then implement 3 and the front end together. what do you think about this? thanks @zhoujinsong @majin1102

zhoujinsong commented 1 month ago

Thanks for pushing this improvement forward! @klion26

remove the ServerCatalog#tableExists(db, table) in TableController#getOptimizingProcesses because we can reuse the table load logic in the following logic

I am okay with this change, we will load the table in following steps, we chan check this in that phrase.

push the offset and limit in mapper#selectOptimizingProcesses(MixedAndIcebergTableDescriptor#getOptimizingProcessesInfo) to db (this can limit the items retrieved from db)

Yes, we should limit the result set return by database as we may have a large history records.

add format info when calling this rest API, so that MixedAndIceberg can avoid loading the table from external catalog(this is to be confirmed)

I am not sure if we still need to load the table, if yes we can get the table format from the loaded table.