apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
748 stars 261 forks source link

[AMORO-2893] optimize table page loading #2914

Open klion26 opened 4 weeks ago

klion26 commented 4 weeks ago

Why are the changes needed?

Close #2893 .

Optimize the loading process when user open the optimizer page

Brief change log

-

How was this patch tested?

Documentation

klion26 commented 4 weeks ago

Currently, this is a draft version just a better discussion. the open questions are

  1. Do we need to include the table format in what we return to the front end, which right now doesn't seem to be using it?
  2. If we don't need to need to include the table format, which way do we prefer to add the infor no in table_runtime 2.1. select info from DB for the given tables retrieved in the previous step 2.2. retrieve the info from TableManager(for example add a function TableManager#getTableRuntime(int tableId)), which needs to maintain an tableId to tableRuntime relationship in memory

I'm leaning towards option 2.2 with not returning the table format to the front end because it reduces the number of DB requests by one

Please let me what you think about this, thanks. @zhoujinsong @majin1102

baiyangtx commented 2 weeks ago

Now that you have queried the TableRuntimeBean object from the database, why not directly perform sorting or paging operations on this object later?

majin1102 commented 2 weeks ago

Currently, this is a draft version just a better discussion. the open questions are

  1. Do we need to include the table format in what we return to the front end, which right now doesn't seem to be using it?
  2. If we don't need to need to include the table format, which way do we prefer to add the infor no in table_runtime 2.1. select info from DB for the given tables retrieved in the previous step 2.2. retrieve the info from TableManager(for example add a function TableManager#getTableRuntime(int tableId)), which needs to maintain an tableId to tableRuntime relationship in memory

I'm leaning towards option 2.2 with not returning the table format to the front end because it reduces the number of DB requests by one

Please let me what you think about this, thanks. @zhoujinsong @majin1102

I think decoupling from TableService and TableManager is a good idea. We may seperate dashboard server from ams in the future. TableFormat seems unnecessary in this page for now

klion26 commented 2 weeks ago

@baiyangtx thanks for the review, the bottle-neck here is not the cost of the sort, but we retrieved too much data from DB, the solution here wants to limit the row number when retrieved from DB(the sort in SQL is needed because we want to show the 'running' status before the 'idle' status in the frontend)

klion26 commented 5 days ago

@majin1102 sorry for the late reply, I've drafted a version such that: TableRuntimeMeta is used for retrieving from db, TableRuntime is used for data transfer object in ams. could you please take a took if this is the right way when you're free? thanks.