delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Support OPTIMIZE tbl FULL for clustered table #3793

Closed dabao521 closed 3 weeks ago

dabao521 commented 1 month ago

Which Delta project/connector is this regarding?

Description

  1. Add new sql syntax OPTIMIZE tbl FULL
  2. Implemented OPTIMIZE tbl FULL to re-cluster all data in the table.

How was this patch tested?

new unit tests added

Does this PR introduce any user-facing changes?

Yes Previously clustered table won't re-cluster data that was clustered against different cluster keys. With OPTIMIZE tbl FULL, they will be re-clustered against the new keys.