Closed lkrishna-cs closed 2 years ago
This is unfortunately an Apache Spark limitation, not a Delta Lake limitation. Apache Spark current supports only Dynamic Partition Pruning, not Dynamic File Pruning. These kind of query optimization is the responsibility of the processing engine, not the data format.
@tdas in that case, this issue should be added to spark if I am not wrong. I also feel that this is an important enhancement in terms of performance boost. Please correct me if I am wrong. Since databricks has this, is this their propriety thing even today?
Feature request
Overview
We have lots of use cases that require fast query access to DeltaLake which doesn't seem to scale very much. Reading through the databricks documentation below it looks like the dynamic file pruning feature is what we want to leverage. However, this isn't available in Delta Lake open source.
https://www.databricks.com/blog/2020/04/30/faster-sql-queries-on-delta-lake-with-dynamic-file-pruning.html
Motivation
This feature will greatly help us from using native DeltaLake capabilities rather than looking at other options for low latency reads.
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?