Fixes an issue in CatalystUtil that causes the where clauses in some GroupBys to not be applied.
Some notes from @piyushn-stripe's investigation: When certain selects exist in a GroupBy (such as the CAST(get_json_object(... in the unit test we added to this PR), we were hitting the ProjectExec(projectList, childPlan) branch of CatalystUtil initialize code instead of case ProjectExec(projectList, fp@FilterExec(condition, child)). This caused the wheres to not be applied.
To fix this, we added code from the WholeStageCodecgenExec branch into the ProjectExec(projectList, childPlan) branch.
Why / Goal
We observed that our Flink app (Chronon on Flink) was not filtering out events and the where clause in some GroupBys wasn't being applied at all. We narrowed it down to an issue in CatalystUtil and were able to reproduce it with a unit test.
Test Plan
[X] Added Unit Tests
[X] Covered by existing CI (to some extent; there are a number of existing CatalystUtil tests)
[ ] Integration tested
We temporarily modified the code here to throw an error in the new case whc @ WholeStageCodegenExec(fp @ FilterExec(condition, child)) => case, and verified that only the GroupBys that weren't filtering correctly are affected by this change. We have hundreds of GroupBys defined, so this gives us confidence that, at least on our side, this is a safe change.
Summary
Fixes an issue in CatalystUtil that causes the
where
clauses in some GroupBys to not be applied.Some notes from @piyushn-stripe's investigation: When certain
select
s exist in a GroupBy (such as theCAST(get_json_object(...
in the unit test we added to this PR), we were hitting theProjectExec(projectList, childPlan)
branch of CatalystUtilinitialize
code instead ofcase ProjectExec(projectList, fp@FilterExec(condition, child))
. This caused thewhere
s to not be applied.To fix this, we added code from the
WholeStageCodecgenExec
branch into theProjectExec(projectList, childPlan)
branch.Why / Goal
We observed that our Flink app (Chronon on Flink) was not filtering out events and the
where
clause in some GroupBys wasn't being applied at all. We narrowed it down to an issue in CatalystUtil and were able to reproduce it with a unit test.Test Plan
We temporarily modified the code here to throw an error in the new
case whc @ WholeStageCodegenExec(fp @ FilterExec(condition, child)) =>
case, and verified that only the GroupBys that weren't filtering correctly are affected by this change. We have hundreds of GroupBys defined, so this gives us confidence that, at least on our side, this is a safe change.Checklist
Reviewers