Open galadrielwithlaptop opened 1 year ago
Hi @galadrielwithlaptop
I don't have experience with Trino so I might be missing something, however when creating delta table suing Delta Catalog, the schema information is stored only in _delta_log. No schema information is stored in metastore. The location of _delta_log' is stored in metastore under
flink.table-path` key.
So when you call Describe table
from Flink, using Delta Catalog, the catalog is looking for _delta_log
under flink.table-path
It might be a missing feature since in [design doc] we stated that: (https://docs.google.com/document/d/1L31_CDgVZjy5N4qe7VdDftKn0YAvwSg6Zh2PxQqpIuM) we stated:
SHOULD: Delta table definitions in the metastore created by Flink Delta Catalog should work with other engines. Although the first version may not have this feature, all design decisions must support this feature in order to add it in future releases. e.g.
flink.path
andflink.connector
will be stored in the Hive Metastore … but we wantpath
to be stored in the HMS TLDR the table definition in the metastore doesn’t always work between both Spark and others …
I'm thinking that making Delta Connector to store table location also under location
in metastore key could help here.
@tdas @scottsand-db WDYT? ^
Bug
Describe the problem I am not able to see table schema in Trino for the table I created/registered in Hive metastore from Flink & vice-versa. Not sure, if is it supposed to be like this?
Flink Version: 1.16.0
I added following dep in server classpath: Flink SQL> ADD JAR '/opt/flink-webssh/lib/delta-flink-3.0.0rc1.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/delta-standalone_2.12-3.0.0rc1.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/shapeless_2.12-2.3.4.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/parquet-hadoop-bundle-1.12.2.jar'; Flink SQL> ADD JAR '/opt/flink-webssh/lib/flink-parquet-1.16.0.jar';
Created Delta catalog: CREATE CATALOG delta_catalog WITH ( 'type' = 'delta-catalog', 'catalog-type' = 'hive');
Create Delta Table: CREATE TABLE flightsintervaldata1 (arrivalAirportCandidatesCount INT, estArrivalHour INT) PARTITIONED BY (estArrivalHour) WITH ('connector' = 'delta', 'table-path' = 'abfs://container@storage_account.dfs.core.windows.net'/delta-output);
While reading this table from Trino side:
In SQL DB, I tried to check Table parameters: For Delta table created in Trino: For delta table created in Flink:
Is this intended from delta community or am I missing some parameters while table creation only?