apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
762 stars 267 forks source link

[Improvement]: There are slow SQL issues when querying the optimizing process #2869

Open XBaith opened 1 month ago

XBaith commented 1 month ago

Search before asking

What would you like to be improved?

Improve query performance When I execute the below SQL statement, it cost over 1min:

SELECT a.process_id, a.table_id, a.catalog_name, a.db_name, a.table_name, a.target_snapshot_id,
       a.target_change_snapshot_id, a.status, a.optimizing_type, a.plan_time, a.end_time,
       a.fail_reason, a.summary, a.from_sequence, a.to_sequence 
FROM table_optimizing_process a
INNER JOIN table_identifier b ON a.table_id = b.table_id
WHERE a.catalog_name = 'glue' 
  AND a.db_name = 'xxx' 
  AND a.table_name = 'deprecated_table_version_1716357896'
  AND b.catalog_name = 'glue' 
  AND b.db_name = 'xxx' 
  AND b.table_name = 'deprecated_table_version_1716357896'
ORDER BY process_id DESC;

The execute plan is:

id                       |estRows  |task     |access object                                                                    |operator info                                                                                                                                                                                                                          |
-------------------------+---------+---------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Sort_10                  |88.57    |root     |                                                                                 |bts_arctic.table_optimizing_process.process_id:desc                                                                                                                                                                                    |
└─HashJoin_28            |88.57    |root     |                                                                                 |inner join, equal:[eq(bts_arctic.table_identifier.table_id, bts_arctic.table_optimizing_process.table_id)]                                                                                                                             |
  ├─Point_Get_29(Build)  |1.00     |root     |table:table_identifier, index:table_name_index(catalog_name, db_name, table_name)|                                                                                                                                                                                                                                       |
  └─TableReader_32(Probe)|88.57    |root     |                                                                                 |data:Selection_31                                                                                                                                                                                                                      |
    └─Selection_31       |88.57    |cop[tikv]|                                                                                 |eq(bts_arctic.table_optimizing_process.catalog_name, "glue"), eq(bts_arctic.table_optimizing_process.db_name, "xxx"), eq(bts_arctic.table_optimizing_process.table_name, "deprecated_table_version_1716357896")|
      └─TableFullScan_30 |132215.00|cop[tikv]|table:a                                                                          |keep order:false                                                                                                                                                                                                                       |

How should we improve?

Add indexes to avoid full table scans

Are you willing to submit PR?

Subtasks

No response

Code of Conduct