Open msirek opened 2 years ago
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Is your feature request related to a problem? Please describe. We currently have a query rewrite which splits up a join with ORed equality join predicates into a union of separate joins in order to utilize equijoin for performance. See SplitDisjunctionOfJoinTerms #74303
If one of the tables in the join is very large, or is itself a sequence of joins, as will be enabled through #82964, it may improve performance if we could avoid scanning that relation more than once.
Describe the solution you'd like Identify when we have a query of the form:
... and replace that expression with a new N_Way_Hash_Join expression which makes n build tables for each separate set of
right_table
columns inpred_1
,pred_2
, etc. The DISTINCT ON expression may still be required if the join itself isn't taught how to dedupright_table
rowids. This optimization is intended for vectorized hash join. The idea is to evaluate one columnar batch of rows fromleft_table
at a time on each of the build tables and union the results together. If this doesn't happen concurrently, an optimization could be to skip joining the rows in the batch which are already qualified from joining with a different build table. Note that ifleft_table
is a sequence of joins, thenleft_table.<rowid_or_primary_key_columns>
is a unique identifier created to tag each row in the relation instead of a rowid.Another flavor of n-way join involves ANDed predicates on different tables, for example a star join case with a large fact table. A build table could be made for each of the dimension tables, and a hash join performed using a single scan of the fact table. The join of one columnar batch with one build table could mark which left table rows in the batch are qualified and only join those rows with the next build table.
These rewrites may depend on using a "broadcast" hash join where the smaller build tables are duplicated on each processor so the large left table rows are co-located with all potential matches from the build tables. If the build tables are large, this duplication could be expensive, making the original UNION ALL plan cheaper.
Describe alternatives you've considered Using a materialized CTE for the left table could also prevent duplication of work, but may require more memory or spilling of rows to disk if the input is too large.
Additional context
Jira issue: CRDB-16783