delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.64k stars 1.71k forks source link

[Feature Request] Incremental clustering using ZCube #2449

Open zedtang opened 10 months ago

zedtang commented 10 months ago

Feature request

Which Delta project/connector is this regarding?

Overview

Uber issue: https://github.com/delta-io/delta/issues/1874

According to the design doc, this issue tracks the support for incremental clustering using ZCube.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

zedtang commented 10 months ago

One suggestion from code review is to use a Enum to represent the different mode(compaction/zorder by/clustering): https://github.com/delta-io/delta/pull/2461/files#r1446717782

zedtang commented 10 months ago

Suggestion from code review: assert the clustered table is using "hilbert" curve in tests: https://github.com/delta-io/delta/pull/2461/files#r1446793103