apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
875 stars 292 forks source link

[Improvement]: Needs to sort the external Iceberg table list #1716

Closed HuangFru closed 11 months ago

HuangFru commented 1 year ago

Search before asking

What would you like to be improved?

Currently, an external Iceberg catalog's table list is returned by the Iceberg catalog's listTables method which can not guaranteed order. For example, I have 500 hundred tables named 'mock_table_1' to 'mock_table_500', but the table list in AMS shows like this:

image

It's not very aesthetically pleasing and inconvenient.

How should we improve?

Sorting it by the lexicographical order of the table names.

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

zhoujinsong commented 1 year ago

Sorting by table name can indeed help users find the tables they are interested in more quickly. At the same time, we may also perform this sorting operation on the database list.

zhoujinsong commented 1 year ago

Given that this improvement is very small and meaningful, I suggest opening it up to the new contributors. Therefore, I plan to label it as good first issue.

HuangFru commented 1 year ago

Cause the table_identifier table has UNIQUE KEY table_name_index (catalog_name,db_name,table_name), and we select by 'where catalog = ? and database =?', so it's ordered when selecting from the database.

chenyuzhi459 commented 1 year ago

Sorting by table name can indeed help users find the tables they are interested in more quickly. At the same time, we may also perform this sorting operation on the database list.

I would like to provide this implementation, but considering that amoro has multiple types of ServerCatalog(iceberg/internal/mix-iceberg/paimon), maybe it would be better to wait for the table-list returned by catalog and sort it uniformly in memory, so that all tables returned by the catalog can be sorted.

@HuangFru

HuangFru commented 1 year ago

Sorting by table name can indeed help users find the tables they are interested in more quickly. At the same time, we may also perform this sorting operation on the database list.

I would like to provide this implementation, but considering that amoro has multiple types of ServerCatalog(iceberg/internal/mix-iceberg/paimon), maybe it would be better to wait for the table-list returned by catalog and sort it uniformly in memory, so that all tables returned by the catalog can be sorted.

@HuangFru

Sorry for the late reply, there seems to be a problem with my email reminder. The internal catalog is selected from the database and is ordered. You only need to deal with the external catalog.

chenyuzhi459 commented 1 year ago

Sorting by table name can indeed help users find the tables they are interested in more quickly. At the same time, we may also perform this sorting operation on the database list.

I would like to provide this implementation, but considering that amoro has multiple types of ServerCatalog(iceberg/internal/mix-iceberg/paimon), maybe it would be better to wait for the table-list returned by catalog and sort it uniformly in memory, so that all tables returned by the catalog can be sorted. @HuangFru

Sorry for the late reply, there seems to be a problem with my email reminder. The internal catalog is selected from the database and is ordered. You only need to deal with the external catalog.

However, According to the response code logic of the rest api /catalogs/{catalog}/databases/{db}/tables, the table list returned from TableService will be manually merged with the hive table returned by the MixedHiveCatalogImpl query, so it is not guaranteed that the table list returned is ordered finally. Is it better to sort the table list from the external catalog and sort the results from the interface at the same time?

@HuangFru

HuangFru commented 1 year ago

Sorting by table name can indeed help users find the tables they are interested in more quickly. At the same time, we may also perform this sorting operation on the database list.

I would like to provide this implementation, but considering that amoro has multiple types of ServerCatalog(iceberg/internal/mix-iceberg/paimon), maybe it would be better to wait for the table-list returned by catalog and sort it uniformly in memory, so that all tables returned by the catalog can be sorted. @HuangFru

Sorry for the late reply, there seems to be a problem with my email reminder. The internal catalog is selected from the database and is ordered. You only need to deal with the external catalog.

However, According to the response code logic of the rest api /catalogs/{catalog}/databases/{db}/tables, the table list returned from TableService will be manually merged with the hive table returned by the MixedHiveCatalogImpl query, so it is not guaranteed that the table list returned is ordered finally. Is it better to sort the table list from the external catalog and sort the results from the interface at the same time?

@HuangFru

Then I think we only need to sort the final results of the interface. After all, this only serves the display effect of the table list.