delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Don't apply partition-like data filters to ineligible columns #3872

Open chirag-s-db opened 1 week ago

chirag-s-db commented 1 week ago

Which Delta project/connector is this regarding?

Description

Currently, we will attempt to rewrite partition-like data filters that reference columns that aren't skipping-eligible (for example, array or map-type columns). This will throw an analysis exception because these referenced columns aren't found in the stats. Add the missing match statement needed to avoid rewriting partition-like data filters that reference these columns.

This was originally missed (and is a difference in behavior vs. partition filters) because partitioning isn't allowed on non-atomic types (or string types), so we missed adding this additional match.

How was this patch tested?

See test changes.

Does this PR introduce any user-facing changes?

No.