[X] I have searched in the issues and found no similar issues.
What would you like to be improved?
metastore: hive
catalog: UnifiedCatalog
method: listTables
problem:
When calling mixed-hive/iceberg catalog listTables, it first calls the getAllTables method of HMS to retrieve all table names, and then calls getTableObjectsByName to get the Table objects of these tables, and determines whether the current table is a mixed-hive/iceberg table by checking properties or getSd().
The execution logic of Paimon is to first getAllTables to retrieve all tables, then use getTable to get the Table object of each table, and determine whether this table is a Paimon table.
As mentioned above, if Unified Catalog supports mixed-hive/iceberg/paimon simultaneously, it will call getTables three times, getTableObjectsByName twice (which is a relatively heavy operation), and multiple times getTable.
In addition to being accessed by the frontend to view the table list, the listTables will also be called by the logic to synchronize with the external catalog (default every 3 minutes).
How should we improve?
For the case where the metastore is Hive, we optimize by calling getAllTables and getTableObjectsByName once to retrieve all tables and their types.
Define an interface that supports listing all tables and their formats.
MixedCatalog implements this interface.
MixedHiveCatalog implements this interface.
when call UnifiedCatalog::listTables, we first check the supported FormatCatalog to see if any of them have implemented this interface. If so, we use the table list returned by it instead of calling listTables for each type of FormatCatalog.
Search before asking
What would you like to be improved?
problem:
As mentioned above, if Unified Catalog supports mixed-hive/iceberg/paimon simultaneously, it will call getTables three times, getTableObjectsByName twice (which is a relatively heavy operation), and multiple times getTable.
In addition to being accessed by the frontend to view the table list, the listTables will also be called by the logic to synchronize with the external catalog (default every 3 minutes).
How should we improve?
For the case where the metastore is Hive, we optimize by calling getAllTables and getTableObjectsByName once to retrieve all tables and their types.
Are you willing to submit PR?
Subtasks
No response
Code of Conduct