delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.23k stars 1.62k forks source link

[Kernel][Expression] - Performance Optimization for LIKE expression evaluation #3129

Open krishnanravi opened 1 month ago

krishnanravi commented 1 month ago

Bug

Which Delta project/connector is this regarding?

Describe the problem

LIKE expression implementation can take advantage of a performance optimization to perform java regex conversion only once when the pattern input is a a literal. In the current implementation java regex conversion happens for every row in the input.

Steps to reproduce

for e.g. for the expression col("c1") LIKE 'a%' will invoke LikeExpressionEvaluator.escapeLikeRegex(...) for every single row in the column vector for c1.

Observed results

LikeExpressionEvaluator.escapeLikeRegex(...) for every single row in the column vector expression is evaluated against even when pattern is static.

Expected results

LikeExpressionEvaluator.escapeLikeRegex(...) is invoked only once for the entire column vector when pattern is static.

Environment information

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

krishnanravi commented 1 month ago

@vkorukanti I would like to follow up on the PR to get this done. and I think this should be marked as an enhancement.