delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[BUG][FLINK]Not able to see table schema in Trino for the table I created/registered in Hive metastore from Flink & vice-versa #1971

Open galadrielwithlaptop opened 1 year ago

galadrielwithlaptop commented 1 year ago

Bug

Describe the problem I am not able to see table schema in Trino for the table I created/registered in Hive metastore from Flink & vice-versa. Not sure, if is it supposed to be like this?

Flink Version: 1.16.0

I added following dep in server classpath: Flink SQL> ADD JAR '/opt/flink-webssh/lib/delta-flink-3.0.0rc1.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/delta-standalone_2.12-3.0.0rc1.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/shapeless_2.12-2.3.4.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/parquet-hadoop-bundle-1.12.2.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/flink-parquet-1.16.0.jar';

Created Delta catalog: CREATE CATALOG delta_catalog WITH ( 'type' = 'delta-catalog', 'catalog-type' = 'hive');

Create Delta Table: CREATE TABLE flightsintervaldata1 (arrivalAirportCandidatesCount INT, estArrivalHour INT) PARTITIONED BY (estArrivalHour) WITH ('connector' = 'delta', 'table-path' = 'abfs://container@storage_account.dfs.core.windows.net'/delta-output);

While reading this table from Trino side: MicrosoftTeams-image (3)

In SQL DB, I tried to check Table parameters: For Delta table created in Trino: image For delta table created in Flink:

image

Is this intended from delta community or am I missing some parameters while table creation only?

kristoffSC commented 1 year ago

Hi @galadrielwithlaptop I don't have experience with Trino so I might be missing something, however when creating delta table suing Delta Catalog, the schema information is stored only in _delta_log. No schema information is stored in metastore. The location of _delta_log' is stored in metastore underflink.table-path` key.

So when you call Describe table from Flink, using Delta Catalog, the catalog is looking for _delta_log under flink.table-path

It might be a missing feature since in [design doc] we stated that: (https://docs.google.com/document/d/1L31_CDgVZjy5N4qe7VdDftKn0YAvwSg6Zh2PxQqpIuM) we stated:

SHOULD: Delta table definitions in the metastore created by Flink Delta Catalog should work with other engines. Although the first version may not have this feature, all design decisions must support this feature in order to add it in future releases. e.g. flink.path and flink.connector will be stored in the Hive Metastore … but we want path to be stored in the HMS TLDR the table definition in the metastore doesn’t always work between both Spark and others …

I'm thinking that making Delta Connector to store table location also under location in metastore key could help here.

kristoffSC commented 1 year ago

@tdas @scottsand-db WDYT? ^