apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
684 stars 209 forks source link

[Bug report] list-table api is very slow when table quantity is very large #4089

Open mygrsun opened 3 days ago

mygrsun commented 3 days ago

Version

main branch

Describe what's wrong

Through my test,I found that list-table will takes 300s when a schema has 5000 tables . I analysis the code and add some logs ,then found is the reason for calling the getTableObjectsByName interface. listtable use the getTableObjectsByName .this metatore interface is very slow.

image

Error message and/or stacktrace

I add some logs at 3 positions.

image

the result is:

image

How to reproduce

add 5000 tables to one schema

Additional context

No response

mygrsun commented 3 days ago

I found that using this getTableObjectsByName mainly to filter out inner and outer tables, as well as to filter out iceberg tables. If I don't filter the inner and outer surfaces. What is the impact here? What additional types of tables will be return?

mygrsun commented 3 days ago

Can you provide the direct query time for HMS without Gravitino?

we have tested id。 when I excute "show tables" in hive beeline and sprark .it is very fast. I gusess hiveserver2 don't use this getTableObjectsByName interface .because 'show tables' just return table names.

mchades commented 3 days ago

time1 and time2 do not seem to appear in the picture?

mygrsun commented 3 days ago

time1 and time2 do not seem to appear in the picture?

sorry, i will send you a new one

mygrsun commented 3 days ago

time1 and time2 do not seem to appear in the picture?

image
mygrsun commented 1 day ago
image

I have tryed the listTableNamesByFilter inteface to filter iceberg table。It is a feasible approach. but I did not pay attention to filter the manager and external table,I dont know the point of filtering manager and external table.

so, please check this way. if it is acceptable ,I can submit a pr.

mchades commented 1 day ago
image

I have tryed the listTableNamesByFilter inteface to filter iceberg table。It is a feasible approach. but I did not pay attention to filter the manager and external table,I dont know the point of filtering manager and external table.

so, please check this way. if it is acceptable ,I can submit a pr.

Great! I think we can work on this way. WDYT? @jerryshao @FANNG1

FANNG1 commented 1 day ago

Great! I think we can work on this way. WDYT? @jerryshao @FANNG1

I think it's ok, because this method seems extensible and not only works for filter Iceberg tables.