An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Currently, it's not possible to force run a reclustering of all files in Delta, as running OPTIMIZE {table} FULL is not supported (see Databricks Delta docs.
Motivation
I've personally not been able to see the benefits of Delta clustering and when looking at the statistics of a Delta table, it seems like files that should be clustered together are spread out throughout multiple files. Running a partial optimization does not redistribute existing clusters across files.
Further details
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
[ ] Yes. I can contribute this feature independently.
[ ] Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
[x] No. I cannot contribute this feature at this time.
Feature request
Which Delta project/connector is this regarding?
Overview
Currently, it's not possible to force run a reclustering of all files in Delta, as running
OPTIMIZE {table} FULL
is not supported (see Databricks Delta docs.Motivation
I've personally not been able to see the benefits of Delta clustering and when looking at the statistics of a Delta table, it seems like files that should be clustered together are spread out throughout multiple files. Running a partial optimization does not redistribute existing clusters across files.
Further details
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?