cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.05k stars 3.8k forks source link

opt: locality optimized lookup join doesn't get planned when expected #100896

Open yuzefovich opened 1 year ago

yuzefovich commented 1 year ago

Run on master with

cockroach demo movr --global --nodes 9 --multitenant=false --insecure

and then convert a couple of tables into multi-region

ALTER DATABASE movr PRIMARY REGION "us-east1";
ALTER DATABASE movr ADD REGION "europe-west1";
ALTER DATABASE movr ADD REGION "us-west1";

ALTER TABLE user_promo_codes ADD COLUMN region crdb_internal_region AS (
  CASE WHEN city = 'amsterdam' THEN 'europe-west1'
       WHEN city = 'paris' THEN 'europe-west1'
       WHEN city = 'rome' THEN 'europe-west1'
       WHEN city = 'new york' THEN 'us-east1'
       WHEN city = 'boston' THEN 'us-east1'
       WHEN city = 'washington dc' THEN 'us-east1'
       WHEN city = 'san francisco' THEN 'us-west1'
       WHEN city = 'seattle' THEN 'us-west1'
       WHEN city = 'los angeles' THEN 'us-west1'
       ELSE 'us-east1'
  END
) STORED;
ALTER TABLE user_promo_codes ALTER COLUMN region SET NOT NULL;
ALTER TABLE user_promo_codes SET LOCALITY REGIONAL BY ROW AS "region";

ALTER TABLE users ADD COLUMN region crdb_internal_region AS (
  CASE WHEN city = 'amsterdam' THEN 'europe-west1'
       WHEN city = 'paris' THEN 'europe-west1'
       WHEN city = 'rome' THEN 'europe-west1'
       WHEN city = 'new york' THEN 'us-east1'
       WHEN city = 'boston' THEN 'us-east1'
       WHEN city = 'washington dc' THEN 'us-east1'
       WHEN city = 'san francisco' THEN 'us-west1'
       WHEN city = 'seattle' THEN 'us-west1'
       WHEN city = 'los angeles' THEN 'us-west1'
       ELSE 'us-east1'       
  END
) STORED;
ALTER TABLE users ALTER COLUMN region SET NOT NULL;
ALTER TABLE users SET LOCALITY REGIONAL BY ROW AS "region";

I expect that the query below would get locality-optimized lookup join (since it's effectively doing the same thing as an example from here) but it doesn't:

EXPLAIN (OPT) SELECT * FROM user_promo_codes c INNER LOOKUP JOIN users u ON c.city = u.city AND c.user_id = u.id LIMIT 3;
  inner-join (lookup users [as=u])
   ├── flags: force lookup join (into right side)
   ├── lookup columns are key
   ├── project
   │    ├── locality-optimized-search
   │    │    ├── scan user_promo_codes [as=c]
   │    │    │    ├── constraint: /23/18/19/20: [/'us-east1' - /'us-east1']
   │    │    │    └── limit: 3
   │    │    └── scan user_promo_codes [as=c]
   │    │         ├── constraint: /31/26/27/28
   │    │         │    ├── [/'europe-west1' - /'europe-west1']
   │    │         │    └── [/'us-west1' - /'us-west1']
   │    │         └── limit: 3
   │    └── projections
   │         └── CASE WHEN c.city = 'amsterdam' THEN 'europe-west1' WHEN c.city = 'paris' THEN 'europe-west1' WHEN c.city = 'rome' THEN 'europe-west1' WHEN c.city = 'new york' THEN 'us-east1' WHEN c.city = 'boston' THEN 'us-east1' WHEN c.city = 'washington dc' THEN 'us-east1' WHEN c.city = 'san francisco' THEN 'us-west1' WHEN c.city = 'seattle' THEN 'us-west1' WHEN c.city = 'los angeles' THEN 'us-west1' ELSE 'us-east1' END
   └── filters (true)

Perhaps it's the projection that is preventing that, or perhaps I'm misunderstanding something.

Note that initially I tried CASE statements for the new manually added columns without the ELSE part, in which case the region column could be NULL, but adding the ELSE didn't change anything (in theory, this shouldn't matter since we have NOT NULL constraint on the column).

cc @msirek

Jira issue: CRDB-26678

msirek commented 1 year ago

stmt-bundle-854694549252505601.zip