delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
6.96k stars 1.58k forks source link

[Spark] Avoid parsing in translateFilterForColumnMapping #3001

Closed tomvanbussel closed 2 weeks ago

tomvanbussel commented 2 weeks ago

Which Delta project/connector is this regarding?

Description

This PR changes how Column Mapping is applied to the filters that are pushed down in the Parquet reader. Before this change we would parse the identifiers before replacing the identifiers. This could cause some queries to fail, as the identifiers in the pushed down filters are not 100% guaranteed to be quoted. With this PR we avoid the parsing and instead match the unparsed identifier. If the identifier was not quoted as expected then we simply ignore the predicate. This matches how ParquetFilters (used by ParquetFileFormat) processes the identifiers in the pushed down predicates.

How was this patch tested?

Existing tests

Does this PR introduce any user-facing changes?

No