Open rytaft opened 3 years ago
but there are also cases where plan 2 might be better (e.g., if k is not the primary key, we need to do the index join anyway, so we might as well scan less data in the index).
If the intersection is going to be small after the inverted filter, in plan 3, it will save in the index join cost. Doing lots of PK lookups in the primary index tends to be expensive (based on what I've seen in geospatial queries). If the intersection contains most of a @> '{1}'
, plan 2 could be cheaper. The inverted histogram stats may be helpful in estimating the intersection cardinality -- or maybe not, we won't know what PKs are backing the cardinality of {1}
, {2}
.
That's a good point... I'm not sure we'll be able to do anything much more sophisticated than using the minimum cardinality of the two spans to estimate the cardinality of the intersection. But even that might give us some insight into which plan is better.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
This is still relevant
Consider the query
SELECT k FROM t WHERE a @> '{1}' AND a @> '{2}'
, wherea
is an Array column with an inverted index.Prior to #56732, the optimizer created two alternative plans for this query: (plan 1) A zig-zag join using the inverted index on
a
(plan 2) A scan of the inverted index ona
constrained by the spana @> '{1}'
, followed by an index join with the filtera @> '{2}'
After #56732, the optimizer still creates plan 1, but it no longer creates plan 2. Instead, it creates the following plan: (plan 3) A scan of the inverted index on
a
constrained by the two spansa @> '{1}'
anda @> '{2}'
, followed by aninvertedFilter
operator to intersect the resultsIdeally, we should also create plan 2, so that the coster can choose between all 3 alternatives. There are cases where plan 3 might be better (e.g., if
k
is the primary key, we can avoid doing an index join), but there are also cases where plan 2 might be better (e.g., ifk
is not the primary key, we need to do the index join anyway, so we might as well scan less data in the index).We can also create a better version of plan 2, which considers both spans
a @> '{1}'
anda @> '{2}'
, and only scans whichever one is more selective.Jira issue: CRDB-2898