apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
874 stars 290 forks source link

[Improvement]: Optimize excessive database access in CommonUnifiedCatalog #3330

Closed zhangwl9 closed 4 hours ago

zhangwl9 commented 1 week ago

Search before asking

What would you like to be improved?

Currently, some databaseExists checks in CommonUnifiedCatalog is redundant because the logic can be coverd by the subsequent logic(such as CommonUnifiedCatalog#createDatabase, CommonUnifiedCatalog#dropDatabase) , and the databaseExists checks will make a call to the external catalog(eg HiveMetaStore), which may be time-consuming.

How should we improve?

Remove redundant 'databaseExists' checks in CommonUnifiedCatalog to improve performance.

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

klion26 commented 4 days ago

Thanks for filing the issue, seems the databaseExists call in createDatabase and dropDatabase can be suppressed by the subsequent logic, but the databaseExists in listDatabase seems can not remove directly, we may need to make sure the FormatCatalog#listTables respect the java doc and then can remove the databaseExists

zhangwl9 commented 1 day ago

Thanks for filing the issue, seems the databaseExists call in createDatabase and dropDatabase can be suppressed by the subsequent logic, but the databaseExists in listDatabase seems can not remove directly, we may need to make sure the FormatCatalog#listTables respect the java doc and then can remove the databaseExists

We add exception handling for NoSuchDatabaseException in 'HudiHadoopCatalog#listTables' and 'MixedCatalog#listTables' methods which are called in 'FormatCatalog#listTables' to eliminate the redundancy of 'databaseExists' in 'CommonUnifiedCatalog#listTables'.

zhangwl9 commented 1 day ago

@majin1102 @czy006 could you please take a look at this when you're free, thanks