delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.55k stars 1.7k forks source link

[Feature Request] Correct schema metadata to be synced to Hive Metastore for delta tables #1746

Open sirsha-chatterjee opened 1 year ago

sirsha-chatterjee commented 1 year ago

Feature request

Overview

Delta tables schemas are currently being stored in HMS (Hive Metastore) as a single array:

col array

Motivation

Currently, when the delta tables are created from delta jar, schema are not properly updated to HMS, which leads to an issue in discovery for tables and tables' columns for discovery for hive users.

Steps to reproduce:

spark-sql --packages io.delta:delta-core_2.12:2.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
CREATE TABLE IF NOT EXISTS delta_table_dwh.company_name (
  id INT,
  cname STRING
) USING DELTA 

HMS:

SELECT column_name, type_name
FROM COLUMNS_V2
WHERE CD_ID IN (
    SELECT CD_ID
    FROM SDS
    WHERE SD_ID = (
        SELECT SD_ID
        FROM TBLS
        WHERE tbl_name = 'company_name'
    )
)
ORDER BY column_name ASC;

Output:

+-------------+---------------+
| column_name | type_name     |
+-------------+---------------+
| col         | array<string> |
+-------------+---------------+

Expected Output:

+-------------+---------------+
| column_name | type_name     |
+-------------+---------------+
| cname       | string        |
| id          | bigint        |
+-------------+---------------+

Further details

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

allisonport-db commented 1 year ago

Linking this to https://github.com/delta-io/delta/issues/1478

hurcy commented 1 year ago

@sirsha-chatterjee I want this feature too! I hope I can contribute this feature.