cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

opt: create several alternative plans when inverted index spans are intersected #56868

Open rytaft opened 3 years ago

rytaft commented 3 years ago

Consider the query SELECT k FROM t WHERE a @> '{1}' AND a @> '{2}', where a is an Array column with an inverted index.

Prior to #56732, the optimizer created two alternative plans for this query: (plan 1) A zig-zag join using the inverted index on a (plan 2) A scan of the inverted index on a constrained by the span a @> '{1}', followed by an index join with the filter a @> '{2}'

After #56732, the optimizer still creates plan 1, but it no longer creates plan 2. Instead, it creates the following plan: (plan 3) A scan of the inverted index on a constrained by the two spans a @> '{1}' and a @> '{2}', followed by an invertedFilter operator to intersect the results

Ideally, we should also create plan 2, so that the coster can choose between all 3 alternatives. There are cases where plan 3 might be better (e.g., if k is the primary key, we can avoid doing an index join), but there are also cases where plan 2 might be better (e.g., if k is not the primary key, we need to do the index join anyway, so we might as well scan less data in the index).

We can also create a better version of plan 2, which considers both spans a @> '{1}' and a @> '{2}', and only scans whichever one is more selective.

Jira issue: CRDB-2898

sumeerbhola commented 3 years ago

but there are also cases where plan 2 might be better (e.g., if k is not the primary key, we need to do the index join anyway, so we might as well scan less data in the index).

If the intersection is going to be small after the inverted filter, in plan 3, it will save in the index join cost. Doing lots of PK lookups in the primary index tends to be expensive (based on what I've seen in geospatial queries). If the intersection contains most of a @> '{1}', plan 2 could be cheaper. The inverted histogram stats may be helpful in estimating the intersection cardinality -- or maybe not, we won't know what PKs are backing the cardinality of {1}, {2}.

rytaft commented 3 years ago

That's a good point... I'm not sure we'll be able to do anything much more sophisticated than using the minimum cardinality of the two spans to estimate the cardinality of the intersection. But even that might give us some insight into which plan is better.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

rytaft commented 9 months ago

This is still relevant