Vectorized n-way join - Githubissues

msirek commented 2 years ago

Is your feature request related to a problem? Please describe. We currently have a query rewrite which splits up a join with ORed equality join predicates into a union of separate joins in order to utilize equijoin for performance. See SplitDisjunctionOfJoinTerms #74303

If one of the tables in the join is very large, or is itself a sequence of joins, as will be enabled through #82964, it may improve performance if we could avoid scanning that relation more than once.

Describe the solution you'd like Identify when we have a query of the form:

SELECT DISTINCT ON (left_table.<rowid_or_primary_key_columns>, 
                    right_table.<rowid_or_primary_key_columns>)
dt.*
FROM   (
        SELECT     *
        FROM       left_table
                   INNER JOIN right_table
                   ON         pred_1
                   UNION ALL
        SELECT     *
        FROM       left_table
                   INNER JOIN right_table
                   ON         pred_2
                   UNION ALL
...
       ) dt;

... and replace that expression with a new N_Way_Hash_Join expression which makes n build tables for each separate set of right_table columns in pred_1, pred_2, etc. The DISTINCT ON expression may still be required if the join itself isn't taught how to dedup right_table rowids. This optimization is intended for vectorized hash join. The idea is to evaluate one columnar batch of rows from left_table at a time on each of the build tables and union the results together. If this doesn't happen concurrently, an optimization could be to skip joining the rows in the batch which are already qualified from joining with a different build table. Note that if left_table is a sequence of joins, then left_table.<rowid_or_primary_key_columns> is a unique identifier created to tag each row in the relation instead of a rowid.

Another flavor of n-way join involves ANDed predicates on different tables, for example a star join case with a large fact table. A build table could be made for each of the dimension tables, and a hash join performed using a single scan of the fact table. The join of one columnar batch with one build table could mark which left table rows in the batch are qualified and only join those rows with the next build table.

These rewrites may depend on using a "broadcast" hash join where the smaller build tables are duplicated on each processor so the large left table rows are co-located with all potential matches from the build tables. If the build tables are large, this duplication could be expensive, making the original UNION ALL plan cheaper.

Describe alternatives you've considered Using a materialized CTE for the left table could also prevent duplication of work, but may require more memory or spilling of rows to disk if the input is too large.

Additional context

Jira issue: CRDB-16783

github-actions[bot] commented 10 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

michae2 commented 5 months ago

cockroachdb / cockroach

Vectorized n-way join #82993