Open rytaft opened 3 years ago
Is any work needed for this? I tested out an old suite of tests compiled for hash-sharded indexes yesterday and was pleased to find that we now support all desired optimizations.
Here's an example of the behavior I believe this issue is talking about at work:
demo@127.0.0.1:26257/movr> create table computed (b bool as (i % 2 = 0) stored, i int, primary key (b, i));
CREATE TABLE
demo@127.0.0.1:26257/movr> explain select * from computed where i = 15;
info
------------------------------------
distribution: local
vectorized: true
• scan
missing stats
table: computed@primary
spans: [/false/15 - /false/15]
(7 rows)
demo@127.0.0.1:26257/movr> explain select * from computed where i = 14;
info
----------------------------------
distribution: local
vectorized: true
• scan
missing stats
table: computed@primary
spans: [/true/14 - /true/14]
(7 rows)
Yea, I'm hoping that this will "just work" for the most important cases. At the very least, I want to add test cases for some multi-region-specific examples. One other thing that may be possible is to infer a range/set of values rather than a constant. Using your example from above:
demo@127.0.0.1:26257/movr> create table computed (b bool as (i % 2 = 0) stored, i int, primary key (b, i));
CREATE TABLE
Time: 3ms total (execution 3ms / network 0ms)
demo@127.0.0.1:26257/movr> explain select * from computed where i = 15;
info
------------------------------------
distribution: local
vectorized: true
• scan
missing stats
table: computed@primary
spans: [/false/15 - /false/15]
(7 rows)
Time: 11ms total (execution 10ms / network 1ms)
demo@127.0.0.1:26257/movr> explain select * from computed where i = 15 OR i = 13;
info
----------------------------------
distribution: full
vectorized: true
• filter
│ filter: (i = 15) OR (i = 13)
│
└── • scan
missing stats
table: computed@primary
spans: FULL SCAN
(10 rows)
Time: 1ms total (execution 0ms / network 0ms)
The second case is unconstrained, but we should be able to constrain it two two spans. I'm not sure how important of a use case this is, but if we can avoid fanning out to all regions in cases like this I think it would be valuable.
The second case is unconstrained, but we should be able to constrain it two two spans. I'm not sure how important of a use case this is, but if we can avoid fanning out to all regions in cases like this I think it would be valuable.
I agree that supporting this would be valuable.
At the very least, I want to add test cases for some multi-region-specific examples.
Adding test cases for all of these different queries and doing so using multi-region schemas sounds like a great idea!
It should be relatively straightforward to infer a range/set of values rather than a constant, if this is deemed a priority. It does somewhat complicate the code, though, and it's not clear that this is an important use case, so I'm not going to try to get it into 21.1. #61304 shows that the most important use case already "just works".
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
still relevant
As described in #57682, we will enable customers to specify a computed column expression for the region based on other columns in their schema. If the columns that the computed column depends on are constrained by a query, we should be able to constrain the computed column as well and ensure local scans for many queries.
For example, consider the example of TPC-C. The schema might be specified as
CREATE TABLE warehouse (w_id INT PRIMARY KEY, ...) REGIONAL BY ROW AS (CASE WHEN w_id < 5 THEN 'us-east1' ELSE 'us-west2' END)
. As described in #57718, thePRIMARY KEY
designation will be syntactic sugar forPRIMARY KEY (region, w_id), UNIQUE WITHOUT INDEX (w_id)
. If a user submits the querySELECT * FROM warehouse WHERE w_id = 3
, the optimizer should be able to infer an additional filter,region = 'us-east1'
, allowing the primary index to be constrained.This issue covers the work to ensure that the optimizer infers the additional filter on the computed column when all of its dependent columns are constrained.
Epic CRDB-13964
Jira issue: CRDB-3492