datastrato / gravitino

World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
399 stars 166 forks source link

[Bug report] hive catalog include iceberg table? #3403

Open mygrsun opened 2 weeks ago

mygrsun commented 2 weeks ago

Version

main branch

Describe what's wrong

a schema in hive catalog have the iceberg table image

but iceberg catalog dont't hava hive table image

Error message and/or stacktrace

empty

How to reproduce

use beeline to create hive table and iceberg table in the same database

Additional context

No response

jerryshao commented 2 weeks ago

@mchades Can you please take a look. From a cursory glance, I feel that Hive catalog should filter out non-hive table when fetching from HMS, WDYT?

jerryshao commented 2 weeks ago

@mygrsun do you want take a try if you want to fix it?

mchades commented 2 weeks ago

@mchades Can you please take a look. From a cursory glance, I feel that Hive catalog should filter out non-hive table when fetching from HMS, WDYT?

Does the table in HMS not belong to Hive? How to distinguish whether a table in HMS belongs to Hive or Iceberg? If it is distinguished by the values of InputFormat and OutputFormat properties, then what kind of table should an Iceberg table created through Hive belong to?

jerryshao commented 2 weeks ago

there is a reserved property or others to distinguish whether it is a Hive table or Iceberg. For hudi or others, I think they should also have a flag to differentiate.

mchades commented 2 weeks ago

If I directly show tables in Hive, can I also see the Iceberg table?

jerryshao commented 2 weeks ago

I guess it will, you can take a try. Probably you can list iceberg table in hive, but not from Iceberg catalog.

FANNG1 commented 2 weeks ago

Iceberg catalog use a specific parameter table_type to check whether it's Iceberg table


      List<String> tableNames = clients.run(client -> client.getAllTables(database));
      List<TableIdentifier> tableIdentifiers;

      if (listAllTables) {
        tableIdentifiers =
            tableNames.stream()
                .map(t -> TableIdentifier.of(namespace, t))
                .collect(Collectors.toList());
      } else {
        List<Table> tableObjects =
            clients.run(client -> client.getTableObjectsByName(database, tableNames));
        tableIdentifiers =
            tableObjects.stream()
                .filter(
                    table ->
                        table.getParameters() != null
                            && BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE
                                .equalsIgnoreCase(
                                    table
                                        .getParameters()
                                        .get(BaseMetastoreTableOperations.TABLE_TYPE_PROP)))
                .map(table -> TableIdentifier.of(namespace, table.getTableName()))
                .collect(Collectors.toList());
      }
mchades commented 1 week ago

@mygrsun How do you distinguish between Hive tables and Iceberg tables, and what behavior do you expect?

mygrsun commented 1 week ago

@mygrsun How do you distinguish between Hive tables and Iceberg tables, and what behavior do you expect?

we want to get the distinguish list of iceberg and hive。I think the way provided by FANNG1 is ok

mchades commented 5 days ago

@mygrsun do you want to fix this?

mygrsun commented 4 days ago

@mygrsun do you want to fix this?

yes ,i have the plan to fix it.

mchades commented 4 days ago

@mygrsun do you want to fix this?

yes ,i have the plan to fix it.

great! Can your fix catch up with the 0.5.1 release? We plan to release it this week

mygrsun commented 4 days ago

check my design ,please.

To be able to list both all tables and just list hive tables without iceberg.

my design is add a property in the catalog property . using the property to control list all table or just list hive table without iceberg. the property name is:list-table-with-iceberg public static final String LIST_TABLE_WITH_ICEBERG = "list-table-with-iceberg";

do you think this is ok? @FANNG1 @mchades

mchades commented 4 days ago

check my design ,please.

To be able to list both all tables and just list hive tables without iceberg.

my design is add a property in the catalog property . using the property to control list all table or just list hive table without iceberg. the property name is:list-table-with-iceberg public static final String LIST_TABLE_WITH_ICEBERG = "list-table-with-iceberg";

I saw that the Iceberg community has also encountered similar issues before. It is worth noting that when there are too many tables, filtering tables may cause performance issues.

So I think we should add a list-all-tables property with a default value of true in the Hive catalog. This is consistent with the behavior of the Hive client, and users can set it to false when they need to filter. WDYT? @mygrsun @FANNG1 @jerryshao

mygrsun commented 4 days ago

check my design ,please. To be able to list both all tables and just list hive tables without iceberg. my design is add a property in the catalog property . using the property to control list all table or just list hive table without iceberg. the property name is:list-table-with-iceberg public static final String LIST_TABLE_WITH_ICEBERG = "list-table-with-iceberg";

I saw that the Iceberg community has also encountered similar issues before. It is worth noting that when there are too many tables, filtering tables may cause performance issues.

So I think we should add a list-all-tables property with a default value of true in the Hive catalog. This is consistent with the behavior of the Hive client, and users can set it to false when they need to filter. WDYT? @mygrsun @FANNG1 @jerryshao

i think is okay.

mygrsun commented 4 days ago

@mygrsun do you want to fix this?

yes ,i have the plan to fix it.

great! Can your fix catch up with the 0.5.1 release? We plan to release it this week

yes,i can。