Open ozgune opened 8 years ago
@robin900 -- the reason for these different code paths is historical. multi_join_order.c
holds the main logic for join order planning. multi_logical_planner.c
may contain join order related logic for OUTER JOIN
s.
The third code path for IsJoinClause
relates to lateral joins and it's only enabled through a config flag. For fixing compound join conditions, we probably don't need to touch this function.
I'm noting #26 and #264 as issues related to this one.
IsJoinClause is called form 3 different locations
clause being a join clause and equi-join clause has sometimes same, sometimes different consequences. We need to operate on equi-joins when we are dealing with join sequence selection and join pruning, we can relax equi-join requirement for the rest of the places.
Testing is important here. Basic testing involving just 2 tables would not find any issues as likely problems to occur in join order selection. Complex join queries must be used.
We recently ran into an issue related to compound join conditions (#58) in a private Slack channel. @robin900 looked deeper into the issue and noted that fixing the issue is tricky due to multiple code paths that do join planning. I'm copy/pasting his notes from the Slack chat below.
The two most common patterns for us, when there is a compound join condition, are:
1)
a JOIN b on a.dist_column = b.dist_column AND a.some_tstz <@ b.some_tstzrange
2)a JOIN b on a.dist_column = b.dist_column AND a.tstz < b.tstz
We also have a rare
a JOIN b on a.dist_column = b.dist_column AND a.ctid < b.ctid
but we can work around the ctid problem.My understanding, after reviewing the citus code, is that it would take significant refactoring to evaluate the list of clauses all at once, instead of
IsJoinClause()
called in series on aclauseList
.I note that the dummy join
a.fk_account_id = b.fk_account_id AND b.created_at = lower(a.during)
works merely becauseIsJoinClause
returns true for both clauses.They use
=
and left and right expr reference different tables (different->varno
).Reading the citus code, i see a confusion of what question the function
IsJoinClause
answers. This confusion leads to my confusion.I see 2, perhaps 3, uses of the function:
ApplicableJoinClauses
where it’s just checking whether the left table and right table in each clause are applicable, and it’s not clear that equality operator is necessary.IsJoinClause
for lateral join optimization: https://github.com/citusdata/citus/blob/aa15043b0905a1fdba7f649d4986204c5efdf1e2/src/backend/distributed/planner/multi_logical_optimizer.c#L3431