Open wudanzy opened 2 months ago
@dennyglee Hi from ActionIQ, once the design doc has some comments and is updated, could we get someone from Delta org to take a look?
Sorry for missing this @MasterDDT - will review this shortly!
@dennyglee was anybody able to review ^^ doc?
Feature request
Which Delta project/connector is this regarding?
Overview
Implement bucketing in Delta lake to speed up aggregation and join cases.
Motivation
Currently, I found that Delta Lake doesn’t support bucketing. This leads to inefficiency for two kinds of use cases:
The bucketing was proposed in spark to solve the above problems (see original JIRA and design), so spark has supported bucketing for several years. However, the delta lake does not support bucketing. Delta lake has developed features Z-ordering and liquid clustering, but both features are for data skipping, so both features cannot help avoiding unnecessary shuffles in aggregation & joins.
Further details
The design is here.
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?