delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.22k stars 1.62k forks source link

[Spark] Support create external table for clustered table #3251

Closed zedtang closed 2 weeks ago

zedtang commented 2 weeks ago

Which Delta project/connector is this regarding?

Description

Support creating a clustered table from an external location that already has a clustered table. We follow the same semantics as partitioned tables:

External location already has clustered/partitioned table: Create clustered/partitioned table partitioned clustered
schema not specified, cluster/partitioned by not specified success success
schema specified, cluster by/partitioned by not specified throw DELTA_CREATE_TABLE_WITH_DIFFERENT_PARTITIONING throw DELTA_CREATE_TABLE_WITH_DIFFERENT_CLUSTERING
schema specified, cluster by/partitioned by different column throw DELTA_CREATE_TABLE_WITH_DIFFERENT_PARTITIONING throw DELTA_CREATE_TABLE_WITH_DIFFERENT_CLUSTERING
schema specified, cluster by/partitioned by same column success success
External location already has non-clustered/non-partitioned table: Create clustered/partitioned table partitioned clustered
schema specified, cluster by/partitioned by specified throw DELTA_CREATE_TABLE_WITH_DIFFERENT_PARTITIONING throw DELTA_CREATE_TABLE_WITH_DIFFERENT_CLUSTERING

How was this patch tested?

Added new unit tests to cover all scenarios above.

Does this PR introduce any user-facing changes?

No.