cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

opt: computed columns not derived when pushed through lookup join #63735

Closed nvanbenschoten closed 3 years ago

nvanbenschoten commented 3 years ago

note: the name on this might be very off, I was just guessing

Statement diagnostics of relevant query: stmt-bundle-650217270527393794.zip

In multi-region TPC-C, we have a schema that looks like the following:

CREATE TABLE public.order_line (
    ol_o_id INT8 NOT NULL,
    ol_d_id INT8 NOT NULL,
    ol_w_id INT8 NOT NULL,
    ol_number INT8 NOT NULL,
    ol_i_id INT8 NOT NULL,
    ol_supply_w_id INT8 NULL,
    ol_delivery_d TIMESTAMP NULL,
    ol_quantity INT8 NULL,
    ol_amount DECIMAL(6,2) NULL,
    ol_dist_info CHAR(24) NULL,
    crdb_region public.crdb_internal_region NOT VISIBLE NOT NULL AS (CASE WHEN ol_w_id BETWEEN 0:::INT8 AND 1665:::INT8 THEN 'us-east1':::public.crdb_internal_region WHEN ol_w_id BETWEEN 1666:::INT8 AND 3332:::INT8 THEN 'us-west1':::public.crdb_internal_region WHEN ol_w_id BETWEEN 3333:::INT8 AND 4999:::INT8 THEN 'europe-west2':::public.crdb_internal_region END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (ol_w_id ASC, ol_d_id ASC, ol_o_id DESC, ol_number ASC),
    CONSTRAINT fk_ol_w_id_ref_order FOREIGN KEY (ol_w_id, ol_d_id, ol_o_id) REFERENCES public."order"(o_w_id, o_d_id, o_id) NOT VALID,
    CONSTRAINT fk_ol_supply_w_id_ref_stock FOREIGN KEY (ol_supply_w_id, ol_i_id) REFERENCES public.stock(s_w_id, s_i_id) NOT VALID,
    FAMILY "primary" (ol_o_id, ol_d_id, ol_w_id, ol_number, ol_i_id, ol_supply_w_id, ol_delivery_d, ol_quantity, ol_amount, ol_dist_info, crdb_region)
) LOCALITY REGIONAL BY ROW;

CREATE TABLE public.stock (
    s_i_id INT8 NOT NULL,
    s_w_id INT8 NOT NULL,
    s_quantity INT8 NULL,
    s_dist_01 CHAR(24) NULL,
    s_dist_02 CHAR(24) NULL,
    s_dist_03 CHAR(24) NULL,
    s_dist_04 CHAR(24) NULL,
    s_dist_05 CHAR(24) NULL,
    s_dist_06 CHAR(24) NULL,
    s_dist_07 CHAR(24) NULL,
    s_dist_08 CHAR(24) NULL,
    s_dist_09 CHAR(24) NULL,
    s_dist_10 CHAR(24) NULL,
    s_ytd INT8 NULL,
    s_order_cnt INT8 NULL,
    s_remote_cnt INT8 NULL,
    s_data VARCHAR(50) NULL,
    crdb_region public.crdb_internal_region NOT VISIBLE NOT NULL AS (CASE WHEN s_w_id BETWEEN 0:::INT8 AND 1665:::INT8 THEN 'us-east1':::public.crdb_internal_region WHEN s_w_id BETWEEN 1666:::INT8 AND 3332:::INT8 THEN 'us-west1':::public.crdb_internal_region WHEN s_w_id BETWEEN 3333:::INT8 AND 4999:::INT8 THEN 'europe-west2':::public.crdb_internal_region END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (s_w_id ASC, s_i_id ASC),
    CONSTRAINT fk_s_w_id_ref_warehouse FOREIGN KEY (s_w_id) REFERENCES public.warehouse(w_id) NOT VALID,
    CONSTRAINT fk_s_i_id_ref_item FOREIGN KEY (s_i_id) REFERENCES public.item(i_id) NOT VALID,
    FAMILY "primary" (s_i_id, s_w_id, s_quantity, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_dist_05, s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10, s_ytd, s_order_cnt, s_remote_cnt, s_data, crdb_region)
) LOCALITY REGIONAL BY ROW;

Notice that each table has a crdb_region column which is computed from its warehouse column (ol_w_id for order_line, s_w_id for stock).

One of the "Stock Level" transaction statements runs a join against these two tables. The statement constraints the warehouse sufficiently so that we should be able to understand that we can search locally, but we don't, so the query is executed with a remote scan that bumps it from about 10ms to about 100ms:

root@localhost:26257/tpcc> EXPLAIN SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
    WHERE
      ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );
                                               info
---------------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • group (scalar)
  │
  └── • distinct
      │ distinct on: s_i_id
      │
      └── • lookup join
          │ table: stock@primary
          │ equality: (lookup_join_const_col_@31, ol_w_id, ol_i_id) = (crdb_region,s_w_id,s_i_id)
          │ equality cols are key
          │ pred: (s_w_id = 1) AND (s_quantity < 14)
          │
          └── • cross join
              │
              ├── • values
              │     size: 1 column, 3 rows
              │
              └── • scan
                    missing stats
                    table: order_line@primary
                    spans: [/'us-east1'/1/6/3022 - /'us-east1'/1/6/3003]
(23 rows)

Time: 2ms total (execution 2ms / network 0ms)

root@localhost:26257/tpcc> SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
    WHERE
      ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );
  count
---------
      9
(1 row)

Time: 94ms total (execution 93ms / network 0ms)

If we manually push the s_w_id through the join, things still don't improve:

root@localhost:26257/tpcc> EXPLAIN SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id
    WHERE
      ol_w_id = 1 AND s_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );
                                                        info
---------------------------------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • group (scalar)
  │
  └── • distinct
      │ distinct on: s_i_id
      │
      └── • lookup join
          │ table: stock@primary
          │ equality: (lookup_join_const_col_@31, lookup_join_const_col_@15, ol_i_id) = (crdb_region,s_w_id,s_i_id)
          │ equality cols are key
          │ pred: s_quantity < 14
          │
          └── • render
              │
              └── • cross join
                  │
                  ├── • values
                  │     size: 1 column, 3 rows
                  │
                  └── • scan
                        missing stats
                        table: order_line@primary
                        spans: [/'us-east1'/1/6/3022 - /'us-east1'/1/6/3003]
(25 rows)

Time: 3ms total (execution 2ms / network 0ms)

root@localhost:26257/tpcc> SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id
    WHERE
      ol_w_id = 1 AND s_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );
  count
---------
      9
(1 row)

Time: 97ms total (execution 97ms / network 0ms)

However, if we also manually derive the stock table's crdb_region column, things remain local to the gateway region. We can see this in both a change in query plan and a large speedup:

root@localhost:26257/tpcc> EXPLAIN SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id
    WHERE
      ol_w_id = 1 AND s_w_id = 1 AND stock.crdb_region = 'us-east1' AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );
                                                        info
---------------------------------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • group (scalar)
  │
  └── • distinct
      │ distinct on: s_i_id
      │
      └── • lookup join
          │ table: stock@primary
          │ equality: (lookup_join_const_col_@31, lookup_join_const_col_@15, ol_i_id) = (crdb_region,s_w_id,s_i_id)
          │ equality cols are key
          │ pred: s_quantity < 14
          │
          └── • render
              │
              └── • scan
                    missing stats
                    table: order_line@primary
                    spans: [/'us-east1'/1/6/3022 - /'us-east1'/1/6/3003]
(20 rows)

Time: 3ms total (execution 3ms / network 0ms)

root@localhost:26257/tpcc> SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id
    WHERE
      ol_w_id = 1 AND s_w_id = 1 AND stock.crdb_region = 'us-east1' AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );
  count
---------
      9
(1 row)

Time: 9ms total (execution 9ms / network 0ms)

cc. @rytaft for triage.

nvanbenschoten commented 3 years ago

Here's a possibly related INSERT that is slowed down by its foreign key check: stmt-bundle-650214558116610049.zip.

nvanbenschoten commented 3 years ago

I took a look at what was going wrong here and what it would take to push the constant join predicate through to the other side of the join so that its computed column could be derived and constrained. In doing so, I started playing around with other similar queries.

It turned out that when using an INT column instead of the ENUM column, things actually worked, which I wasn't expecting:

demo@127.0.0.1:26257/movr> CREATE TABLE stock (
    s_i_id INT8 NOT NULL,
    s_w_id INT8 NOT NULL,
    s_quantity INT8 NULL,
    crdb_region INT NOT VISIBLE NOT NULL AS (CASE WHEN s_w_id BETWEEN 0 AND 1665 THEN 100 WHEN s_w_id BETWEEN 1666 AND 3332 THEN 200 WHEN s_w_id BETWEEN 3333 AND 4999 THEN 300 END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, s_w_id ASC, s_i_id ASC)
);
CREATE TABLE order_line (
    ol_o_id INT8 NOT NULL,
    ol_d_id INT8 NOT NULL,
    ol_w_id INT8 NOT NULL,
    ol_number INT8 NOT NULL,
    ol_i_id INT8 NOT NULL,
    crdb_region INT NOT VISIBLE NOT NULL AS (CASE WHEN ol_w_id BETWEEN 0 AND 1665 THEN 400 WHEN ol_w_id BETWEEN 1666 AND 3332 THEN 500 WHEN ol_w_id BETWEEN 3333 AND 4999 THEN 600 END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, ol_w_id ASC, ol_d_id ASC, ol_o_id DESC, ol_number ASC)
);
ALTER TABLE stock INJECT STATISTICS '[
    {
        "columns": [
            "s_i_id"
        ],
        "created_at": "2021-04-13 19:20:01.481197",
        "distinct_count": 100000,
        "histo_col_type": "",
        "name": "__auto__",
        "null_count": 0,
        "row_count": 500000000
    },
    {
        "columns": [
            "s_w_id"
        ],
        "created_at": "2021-04-13 19:20:01.481198",
        "distinct_count": 5000,
        "histo_col_type": "",
        "name": "__auto__",
        "null_count": 0,
        "row_count": 500000000
    },
    {
        "columns": [
            "s_quantity"
        ],
        "created_at": "2021-04-13 19:20:01.481199",
        "distinct_count": 91,
        "histo_col_type": "",
        "name": "__auto__",
        "null_count": 0,
        "row_count": 500000000
    }
]';
EXPLAIN (opt) SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
    WHERE
      ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );

                                                                                                                       info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  scalar-group-by
   ├── columns: count:15
   ├── cardinality: [1 - 1]
   ├── stats: [rows=1]
   ├── cost: 8.4219795
   ├── key: ()
   ├── fd: ()-->(15)
   ├── prune: (15)
   ├── distinct-on
   │    ├── columns: s_i_id:9
   │    ├── grouping columns: s_i_id:9
   │    ├── stats: [rows=0.817973447, distinct(9)=0.817973447, null(9)=0]
   │    ├── cost: 8.38379977
   │    ├── key: (9)
   │    └── inner-join (lookup stock)
   │         ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 s_i_id:9 s_w_id:10 s_quantity:11
   │         ├── key columns: [17 3 5] = [12 10 9]
   │         ├── lookup columns are key
   │         ├── stats: [rows=1, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(9)=0.817973447, null(9)=0, distinct(10)=0.821, null(10)=0]
   │         ├── cost: 8.33562004
   │         ├── fd: ()-->(2,3,10), (9)-->(11), (5)==(9), (9)==(5), (3)==(10), (10)==(3)
   │         ├── project
   │         │    ├── columns: "lookup_join_const_col_@12":17 ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
   │         │    ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(17)=0.821, null(17)=0]
   │         │    ├── cost: 5.02162
   │         │    ├── fd: ()-->(2,3,17)
   │         │    ├── scan order_line
   │         │    │    ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
   │         │    │    ├── constraint: /6/3/2/-1/4: [/400/1/6/3022 - /400/1/6/3003]
   │         │    │    ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(2,3)=0.821, null(2,3)=0, distinct(1-3)=0.821, null(1-3)=0]
   │         │    │    ├── cost: 4.9952
   │         │    │    ├── fd: ()-->(2,3)
   │         │    │    └── prune: (5)
   │         │    └── projections
   │         │         └── 100 [as="lookup_join_const_col_@12":17]
   │         └── filters
   │              ├── s_w_id:10 = 1 [outer=(10), constraints=(/10: [/1 - /1]; tight), fd=()-->(10)]
   │              └── s_quantity:11 < 14 [outer=(11), constraints=(/11: (/NULL - /13]; tight)]
   └── aggregations
        └── count-rows [as=count_rows:15]

Notice the project of 100 as the 17th column with an alias of lookup_join_const_col_@12.

However, when I switched back to the ENUM, things were back to what I was seeing above:

demo@127.0.0.1:26257/movr> CREATE TYPE region AS ENUM ('europe-west2','us-east1','us-west1');
CREATE TABLE stock (
    s_i_id INT8 NOT NULL,
    s_w_id INT8 NOT NULL,
    s_quantity INT8 NULL,
    crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN s_w_id BETWEEN 0 AND 1665 THEN 'us-east1' WHEN s_w_id BETWEEN 1666 AND 3332 THEN 'us-west1' WHEN s_w_id BETWEEN 3333 AND 4999 THEN 'europe-west2' END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, s_w_id ASC, s_i_id ASC)
);
CREATE TABLE order_line (
    ol_o_id INT8 NOT NULL,
    ol_d_id INT8 NOT NULL,
    ol_w_id INT8 NOT NULL,
    ol_number INT8 NOT NULL,
    ol_i_id INT8 NOT NULL,
    crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN ol_w_id BETWEEN 0 AND 1665 THEN 'us-west1' WHEN ol_w_id BETWEEN 1666 AND 3332 THEN 'europe-west2' WHEN ol_w_id BETWEEN 3333 AND 4999 THEN 'us-east1' END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, ol_w_id ASC, ol_d_id ASC, ol_o_id DESC, ol_number ASC)
);
ALTER TABLE stock INJECT STATISTICS '[
    {
        "columns": [
            "s_i_id"
        ],
        "created_at": "2021-04-13 19:20:01.481197",
        "distinct_count": 100000,
        "histo_col_type": "",
        "name": "__auto__",
        "null_count": 0,
        "row_count": 500000000
    },
    {
        "columns": [
            "s_w_id"
        ],
        "created_at": "2021-04-13 19:20:01.481198",
        "distinct_count": 5000,
        "histo_col_type": "",
        "name": "__auto__",
        "null_count": 0,
        "row_count": 500000000
    },
    {
        "columns": [
            "s_quantity"
        ],
        "created_at": "2021-04-13 19:20:01.481199",
        "distinct_count": 91,
        "histo_col_type": "",
        "name": "__auto__",
        "null_count": 0,
        "row_count": 500000000
    }
]';
EXPLAIN (opt,verbose) SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
    WHERE
      ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );

                                                                                                                       info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  scalar-group-by
   ├── columns: count:15
   ├── cardinality: [1 - 1]
   ├── stats: [rows=1]
   ├── cost: 15.0900571
   ├── key: ()
   ├── fd: ()-->(15)
   ├── prune: (15)
   ├── distinct-on
   │    ├── columns: s_i_id:9
   │    ├── grouping columns: s_i_id:9
   │    ├── stats: [rows=0.817973447, distinct(9)=0.817973447, null(9)=0]
   │    ├── cost: 15.0518773
   │    ├── key: (9)
   │    └── inner-join (lookup stock)
   │         ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 s_i_id:9 s_w_id:10 s_quantity:11
   │         ├── key columns: [17 3 5] = [12 10 9]
   │         ├── lookup columns are key
   │         ├── stats: [rows=1, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(9)=0.817973447, null(9)=0, distinct(10)=0.821, null(10)=0]
   │         ├── cost: 15.0036976
   │         ├── fd: ()-->(2,3,10), (9)-->(11), (5)==(9), (9)==(5), (3)==(10), (10)==(3)
   │         ├── inner-join (cross)
   │         │    ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 "lookup_join_const_col_@12":17
   │         │    ├── multiplicity: left-rows(zero-or-more), right-rows(one-or-more)
   │         │    ├── stats: [rows=2.463, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(17)=2.463, null(17)=0]
   │         │    ├── cost: 5.1216975
   │         │    ├── fd: ()-->(2,3)
   │         │    ├── values
   │         │    │    ├── columns: "lookup_join_const_col_@12":17
   │         │    │    ├── cardinality: [3 - 3]
   │         │    │    ├── stats: [rows=3, distinct(17)=3, null(17)=0]
   │         │    │    ├── cost: 0.04
   │         │    │    ├── ('europe-west2',)
   │         │    │    ├── ('us-east1',)
   │         │    │    └── ('us-west1',)
   │         │    ├── scan order_line
   │         │    │    ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
   │         │    │    ├── constraint: /6/3/2/-1/4: [/'us-west1'/1/6/3022 - /'us-west1'/1/6/3003]
   │         │    │    ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(2,3)=0.821, null(2,3)=0, distinct(1-3)=0.821, null(1-3)=0]
   │         │    │    ├── cost: 4.9952
   │         │    │    ├── fd: ()-->(2,3)
   │         │    │    └── prune: (5)
   │         │    └── filters (true)
   │         └── filters
   │              ├── s_w_id:10 = 1 [outer=(10), constraints=(/10: [/1 - /1]; tight), fd=()-->(10)]
   │              └── s_quantity:11 < 14 [outer=(11), constraints=(/11: (/NULL - /13]; tight)]
   └── aggregations
        └── count-rows [as=count_rows:15]

Notice the values expression with 'europe-west2', 'us-east1', and 'us-west1' as the 17th column with an alias of lookup_join_const_col_@12.

With a bit of code instrumentation, I found that inside of CustomFuncs.GenerateLookupJoins, we actually had both column filters for this crdb_region column in the optionalFilters slice. crdb_region IN ('europe-west2', 'us-east1', 'us-west1') came from the call to CustomFuncs.checkConstraintFilters and crdb_region IS 'us-east1' came from the call to CustomFuncs.computedColFilters.

The problem was that CustomFuncs.findJoinFilterConstants was returning the first of these because it matched, without realizing that a better match existed later in the slice of filters.

To demonstrate this, a made the following patch and the problem disappeared. With it, we correctly constrained the crdb_region column.

--- a/pkg/sql/opt/xform/join_funcs.go
+++ b/pkg/sql/opt/xform/join_funcs.go
@@ -223,9 +223,9 @@ func (c *CustomFuncs) GenerateLookupJoins(

        // Generate implicit filters from CHECK constraints and computed columns as
        // optional filters to help generate lookup join keys.
-       optionalFilters := c.checkConstraintFilters(scanPrivate.Table)
-       computedColFilters := c.computedColFilters(scanPrivate, on, optionalFilters)
-       optionalFilters = append(optionalFilters, computedColFilters...)
+       optionalFiltersBefore := c.checkConstraintFilters(scanPrivate.Table)
+       computedColFilters := c.computedColFilters(scanPrivate, on, optionalFiltersBefore)
+       optionalFilters := append(computedColFilters, optionalFiltersBefore...)

        var pkCols opt.ColList
        var iter scanIndexIter
demo@127.0.0.1:26257/movr> EXPLAIN (opt,verbose) SELECT
  count(*)
FROM
  (
    SELECT
      DISTINCT s_i_id
    FROM
      order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
    WHERE
      ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
  );

                                                                                                                       info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  scalar-group-by
   ├── columns: count:15
   ├── cardinality: [1 - 1]
   ├── stats: [rows=1]
   ├── cost: 8.4219795
   ├── key: ()
   ├── fd: ()-->(15)
   ├── prune: (15)
   ├── distinct-on
   │    ├── columns: s_i_id:9
   │    ├── grouping columns: s_i_id:9
   │    ├── stats: [rows=0.817973447, distinct(9)=0.817973447, null(9)=0]
   │    ├── cost: 8.38379977
   │    ├── key: (9)
   │    └── inner-join (lookup stock)
   │         ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 s_i_id:9 s_w_id:10 s_quantity:11
   │         ├── key columns: [17 3 5] = [12 10 9]
   │         ├── lookup columns are key
   │         ├── stats: [rows=1, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(9)=0.817973447, null(9)=0, distinct(10)=0.821, null(10)=0]
   │         ├── cost: 8.33562004
   │         ├── fd: ()-->(2,3,10), (9)-->(11), (5)==(9), (9)==(5), (3)==(10), (10)==(3)
   │         ├── project
   │         │    ├── columns: "lookup_join_const_col_@12":17 ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
   │         │    ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(17)=0.821, null(17)=0]
   │         │    ├── cost: 5.02162
   │         │    ├── fd: ()-->(2,3,17)
   │         │    ├── scan order_line
   │         │    │    ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
   │         │    │    ├── constraint: /6/3/2/-1/4: [/'us-west1'/1/6/3022 - /'us-west1'/1/6/3003]
   │         │    │    ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(2,3)=0.821, null(2,3)=0, distinct(1-3)=0.821, null(1-3)=0]
   │         │    │    ├── cost: 4.9952
   │         │    │    ├── fd: ()-->(2,3)
   │         │    │    └── prune: (5)
   │         │    └── projections
   │         │         └── 'us-east1' [as="lookup_join_const_col_@12":17]
   │         └── filters
   │              ├── s_w_id:10 = 1 [outer=(10), constraints=(/10: [/1 - /1]; tight), fd=()-->(10)]
   │              └── s_quantity:11 < 14 [outer=(11), constraints=(/11: (/NULL - /13]; tight)]
   └── aggregations
        └── count-rows [as=count_rows:15]

So the issue seems to be that CustomFuncs.findJoinFilterConstants returns any matching filter, instead of returning the best matching filter. I don't know how to best fix this, as I don't think it's always the case that the filters returned from computedColFilters and preferable to those returned from checkConstraintFilters.

Should we be choosing the matching filter that results in the smallest number of constVals? A patch like the following does seem to fix things:

diff --git a/pkg/sql/opt/xform/join_funcs.go b/pkg/sql/opt/xform/join_funcs.go
index cba1be97fa..8b43041514 100644
--- a/pkg/sql/opt/xform/join_funcs.go
+++ b/pkg/sql/opt/xform/join_funcs.go
@@ -906,13 +906,15 @@ func (c *CustomFuncs) mapInvertedJoin(
 }

 // findJoinFilterConstants tries to find a filter that is exactly equivalent to
-// constraining the given column to a constant value or a set of constant
-// values. If successful, the constant values and the index of the constraining
-// FiltersItem are returned. Note that the returned constant values do not
-// contain NULL.
+// constraining the given column to a constant value or a set of constant values
+// and minimizes the number of constant values. If successful, the constant
+// values and the index of the constraining FiltersItem are returned. Note that
+// the returned constant values do not contain NULL.
 func (c *CustomFuncs) findJoinFilterConstants(
        filters memo.FiltersExpr, col opt.ColumnID,
 ) (values tree.Datums, filterIdx int, ok bool) {
+       var bestValues tree.Datums
+       var bestFilterIdx int
        for filterIdx := range filters {
                props := filters[filterIdx].ScalarProps()
                if props.TightConstraints {
@@ -927,12 +929,16 @@ func (c *CustomFuncs) findJoinFilterConstants(
                                        break
                                }
                        }
-                       if !hasNull {
-                               return constVals, filterIdx, true
+                       if !hasNull && (bestValues == nil || len(bestValues) > len(constVals)) {
+                               bestValues = constVals
+                               bestFilterIdx = filterIdx
                        }
                }
        }
-       return nil, -1, false
+       if bestValues == nil {
+               return nil, -1, false
+       }
+       return bestValues, bestFilterIdx, true
 }

 // constructJoinWithConstants constructs a cross join that joins every row in

cc. @mgartner as you appear to be the expert here.

nvanbenschoten commented 3 years ago

Here's a possibly related INSERT that is slowed down by its foreign key check

Unfortunately, this did not seem to help the foreign key checks. From my reading of the EXPLAIN output, we aren't even pushing the constant non-computed columns through to the other side of that lookup join (see "lookup expression"):

root@localhost:26257/tpcc> EXPLAIN (opt, verbose) INSERT INTO history (h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES (2057, 4, 3, 4, 3, 2100.9, '2021-04-15 15:22:14', 'test data');
                                                                                                                  info
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  insert history
   ├── columns: <none>
   ├── insert-mapping:
   │    ├── column21:21 => rowid:1
   │    ├── column1:13 => history.h_c_id:2
   │    ├── column2:14 => history.h_c_d_id:3
   │    ├── column3:15 => history.h_c_w_id:4
   │    ├── column4:16 => history.h_d_id:5
   │    ├── column5:17 => history.h_w_id:6
   │    ├── column7:19 => h_date:7
   │    ├── h_amount:22 => history.h_amount:8
   │    ├── column8:20 => h_data:9
   │    └── column23:23 => history.crdb_region:10
   ├── check columns: check1:24
   ├── input binding: &1
   ├── cardinality: [0 - 0]
   ├── volatile, mutations
   ├── stats: [rows=0]
   ├── cost: 8.0713654
   ├── values
   │    ├── columns: column1:13 column2:14 column3:15 column4:16 column5:17 column7:19 column8:20 column21:21 h_amount:22 column23:23 check1:24
   │    ├── cardinality: [1 - 1]
   │    ├── volatile
   │    ├── stats: [rows=1, distinct(13)=1, null(13)=0, distinct(14)=1, null(14)=0, distinct(15)=1, null(15)=0, distinct(16)=1, null(16)=0, distinct(17)=1, null(17)=0]
   │    ├── cost: 0.02
   │    ├── key: ()
   │    ├── fd: ()-->(13-17,19-24)
   │    ├── prune: (13-17,19-24)
   │    └── (2057, 4, 3, 4, 3, '2021-04-15 15:22:14', 'test data', gen_random_uuid(), 2100.9, 'us-east1', true)
   └── f-k-checks
        ├── f-k-checks-item: history(h_c_w_id,h_c_d_id,h_c_id) -> customer(c_w_id,c_d_id,c_id)
        │    └── anti-join (lookup customer)
        │         ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
        │         ├── lookup expression
        │         │    └── filters
        │         │         ├── h_c_w_id:25 = c_w_id:30 [outer=(25,30), constraints=(/25: (/NULL - ]; /30: (/NULL - ]), fd=(25)==(30), (30)==(25)]
        │         │         ├── h_c_d_id:26 = c_d_id:29 [outer=(26,29), constraints=(/26: (/NULL - ]; /29: (/NULL - ]), fd=(26)==(29), (29)==(26)]
        │         │         ├── h_c_id:27 = c_id:28 [outer=(27,28), constraints=(/27: (/NULL - ]; /28: (/NULL - ]), fd=(27)==(28), (28)==(27)]
        │         │         └── customer.crdb_region:49 IN ('europe-west2', 'us-east1', 'us-west1') [outer=(49), constraints=(/49: [/'europe-west2' - /'europe-west2'] [/'us-east1' - /'us-east1'] [/'us-west1' - /'us-west1']; tight)]
        │         ├── lookup columns are key
        │         ├── cardinality: [0 - 1]
        │         ├── stats: [rows=1e-10]
        │         ├── cost: 4.02000049
        │         ├── key: ()
        │         ├── fd: ()-->(25-27)
        │         ├── cte-uses
        │         │    └── &1: count=1 used-columns=(13-15)
        │         ├── with-scan &1
        │         │    ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
        │         │    ├── mapping:
        │         │    │    ├──  column3:15 => h_c_w_id:25
        │         │    │    ├──  column2:14 => h_c_d_id:26
        │         │    │    └──  column1:13 => h_c_id:27
        │         │    ├── cardinality: [1 - 1]
        │         │    ├── stats: [rows=1, distinct(25)=1, null(25)=0, distinct(26)=1, null(26)=0, distinct(27)=1, null(27)=0]
        │         │    ├── cost: 0.01
        │         │    ├── key: ()
        │         │    ├── fd: ()-->(25-27)
        │         │    └── cte-uses
        │         │         └── &1: count=1 used-columns=(13-15)
        │         └── filters (true)
        └── f-k-checks-item: history(h_w_id,h_d_id) -> district(d_w_id,d_id)
             └── anti-join (lookup district)
                  ├── columns: h_w_id:52 h_d_id:53
                  ├── lookup expression
                  │    └── filters
                  │         ├── h_w_id:52 = d_w_id:55 [outer=(52,55), constraints=(/52: (/NULL - ]; /55: (/NULL - ]), fd=(52)==(55), (55)==(52)]
                  │         ├── h_d_id:53 = d_id:54 [outer=(53,54), constraints=(/53: (/NULL - ]; /54: (/NULL - ]), fd=(53)==(54), (54)==(53)]
                  │         └── district.crdb_region:65 IN ('europe-west2', 'us-east1', 'us-west1') [outer=(65), constraints=(/65: [/'europe-west2' - /'europe-west2'] [/'us-east1' - /'us-east1'] [/'us-west1' - /'us-west1']; tight)]
                  ├── lookup columns are key
                  ├── cardinality: [0 - 1]
                  ├── stats: [rows=1e-10]
                  ├── cost: 4.02136491
                  ├── key: ()
                  ├── fd: ()-->(52,53)
                  ├── cte-uses
                  │    └── &1: count=1 used-columns=(16,17)
                  ├── with-scan &1
                  │    ├── columns: h_w_id:52 h_d_id:53
                  │    ├── mapping:
                  │    │    ├──  column5:17 => h_w_id:52
                  │    │    └──  column4:16 => h_d_id:53
                  │    ├── cardinality: [1 - 1]
                  │    ├── stats: [rows=1, distinct(52)=1, null(52)=0, distinct(53)=1, null(53)=0]
                  │    ├── cost: 0.01
                  │    ├── key: ()
                  │    ├── fd: ()-->(52,53)
                  │    └── cte-uses
                  │         └── &1: count=1 used-columns=(16,17)
                  └── filters (true)
nvanbenschoten commented 3 years ago

Here's an easy way to reproduce this foreign-key behavior locally:

SET experimental_enable_unique_without_index_constraints = true;
CREATE TYPE region AS ENUM ('europe-west2','us-east1','us-west1');
CREATE TABLE district (
    d_id INT8 NOT NULL,
    d_w_id INT8 NOT NULL,
    crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN d_w_id BETWEEN 0 AND 1665 THEN 'us-east1' WHEN d_w_id BETWEEN 1666 AND 3332 THEN 'us-west1' WHEN d_w_id BETWEEN 3333 AND 4999 THEN 'europe-west2' END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (crdb_region, d_w_id, d_id),
    UNIQUE WITHOUT INDEX (d_w_id, d_id)
);
CREATE TABLE history (
    rowid UUID NOT NULL DEFAULT gen_random_uuid(),
    h_d_id INT8 NOT NULL,
    h_w_id INT8 NOT NULL,
    crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN h_w_id BETWEEN 0 AND 1665 THEN 'us-west1' WHEN h_w_id BETWEEN 1666 AND 3332 THEN 'europe-west2' WHEN h_w_id BETWEEN 3333 AND 4999 THEN 'us-east1' END) STORED,
    CONSTRAINT "primary" PRIMARY KEY (crdb_region, h_w_id, rowid),
    CONSTRAINT fk_h_w_id_ref_district FOREIGN KEY (h_w_id, h_d_id) REFERENCES district(d_w_id, d_id) NOT VALID
);
EXPLAIN (opt,verbose) INSERT INTO history (h_d_id, h_w_id) VALUES (4, 3);

                                                                                                                  info
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  insert history
   ├── columns: <none>
   ├── insert-mapping:
   │    ├── column9:9 => rowid:1
   │    ├── column1:7 => history.h_d_id:2
   │    ├── column2:8 => history.h_w_id:3
   │    └── column10:10 => history.crdb_region:4
   ├── check columns: check1:11
   ├── input binding: &1
   ├── cardinality: [0 - 0]
   ├── volatile, mutations
   ├── stats: [rows=0]
   ├── cost: 4.05639
   ├── values
   │    ├── columns: column1:7 column2:8 column9:9 column10:10 check1:11
   │    ├── cardinality: [1 - 1]
   │    ├── volatile
   │    ├── stats: [rows=1, distinct(7)=1, null(7)=0, distinct(8)=1, null(8)=0]
   │    ├── cost: 0.02
   │    ├── key: ()
   │    ├── fd: ()-->(7-11)
   │    ├── prune: (7-11)
   │    └── (4, 3, gen_random_uuid(), 'us-west1', true)
   └── f-k-checks
        └── f-k-checks-item: history(h_w_id,h_d_id) -> district(d_w_id,d_id)
             └── anti-join (lookup district)
                  ├── columns: h_w_id:12 h_d_id:13
                  ├── lookup expression
                  │    └── filters
                  │         ├── h_w_id:12 = d_w_id:15 [outer=(12,15), constraints=(/12: (/NULL - ]; /15: (/NULL - ]), fd=(12)==(15), (15)==(12)]
                  │         ├── h_d_id:13 = d_id:14 [outer=(13,14), constraints=(/13: (/NULL - ]; /14: (/NULL - ]), fd=(13)==(14), (14)==(13)]
                  │         └── district.crdb_region:16 IN ('europe-west2', 'us-east1', 'us-west1') [outer=(16), constraints=(/16: [/'europe-west2' - /'europe-west2'] [/'us-east1' - /'us-east1'] [/'us-west1' - /'us-west1']; tight)]
                  ├── lookup columns are key
                  ├── cardinality: [0 - 1]
                  ├── stats: [rows=1e-10]
                  ├── cost: 4.02639
                  ├── key: ()
                  ├── fd: ()-->(12,13)
                  ├── cte-uses
                  │    └── &1: count=1 used-columns=(7,8)
                  ├── with-scan &1
                  │    ├── columns: h_w_id:12 h_d_id:13
                  │    ├── mapping:
                  │    │    ├──  column2:8 => h_w_id:12
                  │    │    └──  column1:7 => h_d_id:13
                  │    ├── cardinality: [1 - 1]
                  │    ├── stats: [rows=1, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0]
                  │    ├── cost: 0.01
                  │    ├── key: ()
                  │    ├── fd: ()-->(12,13)
                  │    └── cte-uses
                  │         └── &1: count=1 used-columns=(7,8)
                  └── filters (true)
nvanbenschoten commented 3 years ago

Unfortunately, this did not seem to help the foreign key checks.

I take this back. After I recaptured table statistics for all tables, the foreign key lookups all seemed to properly derive tighter spans through their computed columns. Here's what the previous plan now looks like:

root@localhost:26257/tpcc> EXPLAIN (opt, verbose) INSERT INTO history (h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES (2057, 4, 3, 4, 3, 2100.9, '2021-04-15 15:22:14', 'test data');
                                                                                              info
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  insert history
   ├── columns: <none>
   ├── insert-mapping:
   │    ├── column21:21 => rowid:1
   │    ├── column1:13 => history.h_c_id:2
   │    ├── column2:14 => history.h_c_d_id:3
   │    ├── column3:15 => history.h_c_w_id:4
   │    ├── column4:16 => history.h_d_id:5
   │    ├── column5:17 => history.h_w_id:6
   │    ├── column7:19 => h_date:7
   │    ├── h_amount:22 => history.h_amount:8
   │    ├── column8:20 => h_data:9
   │    └── column23:23 => history.crdb_region:10
   ├── check columns: check1:24
   ├── input binding: &1
   ├── cardinality: [0 - 0]
   ├── volatile, mutations
   ├── stats: [rows=0]
   ├── cost: 8.60451651
   ├── values
   │    ├── columns: column1:13 column2:14 column3:15 column4:16 column5:17 column7:19 column8:20 column21:21 h_amount:22 column23:23 check1:24
   │    ├── cardinality: [1 - 1]
   │    ├── volatile
   │    ├── stats: [rows=1, distinct(13)=1, null(13)=0, distinct(14)=1, null(14)=0, distinct(15)=1, null(15)=0, distinct(16)=1, null(16)=0, distinct(17)=1, null(17)=0]
   │    ├── cost: 0.02
   │    ├── key: ()
   │    ├── fd: ()-->(13-17,19-24)
   │    ├── prune: (13-17,19-24)
   │    └── (2057, 4, 3, 4, 3, '2021-04-15 15:22:14', 'test data', gen_random_uuid(), 2100.9, 'us-east1', true)
   └── f-k-checks
        ├── f-k-checks-item: history(h_c_w_id,h_c_d_id,h_c_id) -> customer(c_w_id,c_d_id,c_id)
        │    └── anti-join (lookup customer)
        │         ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
        │         ├── lookup expression
        │         │    └── filters
        │         │         ├── h_c_w_id:25 = c_w_id:30 [outer=(25,30), constraints=(/25: (/NULL - ]; /30: (/NULL - ]), fd=(25)==(30), (30)==(25)]
        │         │         ├── h_c_d_id:26 = c_d_id:29 [outer=(26,29), constraints=(/26: (/NULL - ]; /29: (/NULL - ]), fd=(26)==(29), (29)==(26)]
        │         │         ├── h_c_id:27 = c_id:28 [outer=(27,28), constraints=(/27: (/NULL - ]; /28: (/NULL - ]), fd=(27)==(28), (28)==(27)]
        │         │         └── customer.crdb_region:49 IN ('europe-west2', 'us-west1') [outer=(49), constraints=(/49: [/'europe-west2' - /'europe-west2'] [/'us-west1' - /'us-west1']; tight)]
        │         ├── lookup columns are key
        │         ├── cardinality: [0 - 1]
        │         ├── stats: [rows=1e-10]
        │         ├── cost: 4.32545852
        │         ├── key: ()
        │         ├── fd: ()-->(25-27)
        │         ├── cte-uses
        │         │    └── &1: count=1 used-columns=(13-15)
        │         ├── anti-join (lookup customer)
        │         │    ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
        │         │    ├── lookup expression
        │         │    │    └── filters
        │         │    │         ├── h_c_w_id:25 = c_w_id:30 [outer=(25,30), constraints=(/25: (/NULL - ]; /30: (/NULL - ]), fd=(25)==(30), (30)==(25)]
        │         │    │         ├── h_c_d_id:26 = c_d_id:29 [outer=(26,29), constraints=(/26: (/NULL - ]; /29: (/NULL - ]), fd=(26)==(29), (29)==(26)]
        │         │    │         ├── h_c_id:27 = c_id:28 [outer=(27,28), constraints=(/27: (/NULL - ]; /28: (/NULL - ]), fd=(27)==(28), (28)==(27)]
        │         │    │         └── customer.crdb_region:49 = 'us-east1' [outer=(49), immutable, constraints=(/49: [/'us-east1' - /'us-east1']; tight), fd=()-->(49)]
        │         │    ├── lookup columns are key
        │         │    ├── cardinality: [0 - 1]
        │         │    ├── stats: [rows=0.666666667, distinct(25)=0.666666667, null(25)=0, distinct(26)=0.666666667, null(26)=0, distinct(27)=0.666666667, null(27)=0]
        │         │    ├── cost: 2.43233936
        │         │    ├── key: ()
        │         │    ├── fd: ()-->(25-27)
        │         │    ├── with-scan &1
        │         │    │    ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
        │         │    │    ├── mapping:
        │         │    │    │    ├──  column3:15 => h_c_w_id:25
        │         │    │    │    ├──  column2:14 => h_c_d_id:26
        │         │    │    │    └──  column1:13 => h_c_id:27
        │         │    │    ├── cardinality: [1 - 1]
        │         │    │    ├── stats: [rows=1, distinct(25)=1, null(25)=0, distinct(26)=1, null(26)=0, distinct(27)=1, null(27)=0]
        │         │    │    ├── cost: 0.01
        │         │    │    ├── key: ()
        │         │    │    ├── fd: ()-->(25-27)
        │         │    │    └── cte-uses
        │         │    │         └── &1: count=1 used-columns=(13-15)
        │         │    └── filters (true)
        │         └── filters (true)
        └── f-k-checks-item: history(h_w_id,h_d_id) -> district(d_w_id,d_id)
             └── anti-join (lookup district)
                  ├── columns: h_w_id:52 h_d_id:53
                  ├── lookup expression
                  │    └── filters
                  │         ├── h_w_id:52 = d_w_id:55 [outer=(52,55), constraints=(/52: (/NULL - ]; /55: (/NULL - ]), fd=(52)==(55), (55)==(52)]
                  │         ├── h_d_id:53 = d_id:54 [outer=(53,54), constraints=(/53: (/NULL - ]; /54: (/NULL - ]), fd=(53)==(54), (54)==(53)]
                  │         └── district.crdb_region:65 IN ('europe-west2', 'us-west1') [outer=(65), constraints=(/65: [/'europe-west2' - /'europe-west2'] [/'us-west1' - /'us-west1']; tight)]
                  ├── lookup columns are key
                  ├── cardinality: [0 - 1]
                  ├── stats: [rows=1e-10]
                  ├── cost: 4.24905799
                  ├── key: ()
                  ├── fd: ()-->(52,53)
                  ├── cte-uses
                  │    └── &1: count=1 used-columns=(16,17)
                  ├── anti-join (lookup district)
                  │    ├── columns: h_w_id:52 h_d_id:53
                  │    ├── lookup expression
                  │    │    └── filters
                  │    │         ├── h_w_id:52 = d_w_id:55 [outer=(52,55), constraints=(/52: (/NULL - ]; /55: (/NULL - ]), fd=(52)==(55), (55)==(52)]
                  │    │         ├── h_d_id:53 = d_id:54 [outer=(53,54), constraints=(/53: (/NULL - ]; /54: (/NULL - ]), fd=(53)==(54), (54)==(53)]
                  │    │         └── district.crdb_region:65 = 'us-east1' [outer=(65), immutable, constraints=(/65: [/'us-east1' - /'us-east1']; tight), fd=()-->(65)]
                  │    ├── lookup columns are key
                  │    ├── cardinality: [0 - 1]
                  │    ├── stats: [rows=0.666666667, distinct(52)=0.666666667, null(52)=0, distinct(53)=0.666666667, null(53)=0]
                  │    ├── cost: 2.39959628
                  │    ├── key: ()
                  │    ├── fd: ()-->(52,53)
                  │    ├── with-scan &1
                  │    │    ├── columns: h_w_id:52 h_d_id:53
                  │    │    ├── mapping:
                  │    │    │    ├──  column5:17 => h_w_id:52
                  │    │    │    └──  column4:16 => h_d_id:53
                  │    │    ├── cardinality: [1 - 1]
                  │    │    ├── stats: [rows=1, distinct(52)=1, null(52)=0, distinct(53)=1, null(53)=0]
                  │    │    ├── cost: 0.01
                  │    │    ├── key: ()
                  │    │    ├── fd: ()-->(52,53)
                  │    │    └── cte-uses
                  │    │         └── &1: count=1 used-columns=(16,17)
                  │    └── filters (true)
                  └── filters (true)

And with that plus the patch from earlier, we get entirely region-local TPC-C transactions!

Screen Shot 2021-04-17 at 2 04 00 AM

(concurrently)

➜ rp ssh nathan-1618341380-01-n12cpu4-geo:4 -- './cockroach workload run tpcc --warehouses=5000 --active-warehouses=300 --partitions=3 --partition-affinity=0 --duration=10m {pgurl:1-3}'
...
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1274            2.1    106.0     79.7    226.5    704.6    973.1  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12489           20.8    102.6     39.8    151.0   1946.2   2550.1  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1251            2.1     17.7      8.4     35.7    318.8    385.9  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12786           21.3     49.6     21.0     92.3    973.1   1946.2  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1277            2.1     21.1     13.6     35.7    302.0    352.3  stockLevel

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0          29077           48.5     72.2     30.4    125.8   1610.6   2550.1

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  600.0s     1248.9  32.4%    102.6     39.8     67.1    151.0   1946.2   2550.1
➜ rp ssh nathan-1618341380-01-n12cpu4-geo:8 -- './cockroach workload run tpcc --warehouses=5000 --active-warehouses=300 --partitions=3 --partition-affinity=1 --duration=10m {pgurl:5-7}'
...
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1279            2.1    131.7     83.9    201.3   1610.6   2818.6  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12556           20.9    117.2     37.7    109.1   2684.4   3489.7  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1267            2.1     23.3      9.4     31.5    436.2    604.0  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12642           21.1     74.1     19.9     65.0   1811.9   2080.4  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1282            2.1     48.2     16.3     71.3    973.1   1140.9  stockLevel

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0          29026           48.4     91.9     31.5    104.9   2281.7   3489.7

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  600.0s     1255.6  32.5%    117.2     37.7     56.6    109.1   2684.4   3489.7
➜ rp ssh nathan-1618341380-01-n12cpu4-geo:12 -- './cockroach workload run tpcc --warehouses=5000 --active-warehouses=300 --partitions=3 --partition-affinity=2 --duration=10m {pgurl:9-11}'
...
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1276            2.1    132.2     71.3    201.3   1744.8   3087.0  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12484           20.8    127.4     33.6    113.2   2818.6   4026.5  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1290            2.1     26.9      7.3     33.6    570.4    738.2  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12713           21.2     61.1     16.8     65.0   1610.6   2818.6  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1275            2.1     48.4     21.0     71.3    872.4   1140.9  stockLevel

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0          29038           48.4     90.6     29.4    104.9   2684.4   4026.5

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  600.0s     1248.4  32.4%    127.4     33.6     60.8    113.2   2818.6   4026.5

For a combined efficiency of 32.4%+32.5%+32.4%=97.3%.

And for reference, here is what one of those delivery transactions looked like:

Screen Shot 2021-04-17 at 2 16 21 AM
rytaft commented 3 years ago

Thanks for all your work on this, @nvanbenschoten! I'll be working on this tomorrow so I'll be able to respond to all of these ideas then.

rytaft commented 3 years ago

Awesome sleuthing, @nvanbenschoten!! I think your diff above to fix findJoinFilterConstants makes perfect sense. Do you want to go ahead and submit a PR for that fix (I can also do it if you prefer)?

The second issue with the foreign key checks seems to be because we haven't costed locality optimized anti joins aggressively enough. I'm glad that recollecting stats seemed to fix the issue, but I think we want to ensure that we always choose locality optimized anti joins when possible (at least until the optimizer is more distribution-aware). I will submit a PR to fix that issue.

One question (unrelated to this issue): I noticed that in the bundle for the first query, there were no stats for the order_line table. I know we've had issues in the past with stats not showing up for TPC-C, but ideally we should always either inject stats for TPC-C using the pre-calculated stats in workload, or they need to be included in the backups that get loaded. Do you know what might have happened in this case?

mgartner commented 3 years ago

Awesome sleuthing, @nvanbenschoten!!

+1 Thanks for getting to the bottom of this! I agree that your proposed fix looks great.

It turned out that when using an INT column instead of the ENUM column, things actually worked, which I wasn't expecting

It may not have been immediately obvious, but this was a brilliant debugging step which highlighted the issue. Perhaps confusing because the issue was less about the type of column and more about the fact that ENUMs come with an implicit CHECK constraint.

nvanbenschoten commented 3 years ago

Thanks for taking a look @rytaft and @mgartner!

I think your diff above to fix findJoinFilterConstants makes perfect sense. Do you want to go ahead and submit a PR for that fix (I can also do it if you prefer)?

Sure, I'll send out the patch, along with some testing that demonstrates the impact.

The second issue with the foreign key checks seems to be because we haven't costed locality optimized anti joins aggressively enough. I'm glad that recollecting stats seemed to fix the issue, but I think we want to ensure that we always choose locality optimized anti joins when possible (at least until the optimizer is more distribution-aware). I will submit a PR to fix that issue.

Sounds great. https://github.com/cockroachdb/cockroach/pull/63830 looks like a good change.

What's interesting about this situation that I hadn't noticed before is that locality-optimized search is saving us here. I had been assuming that with fresh stats we were suddenly able to infer the single crdb_region value in the foreign key lookup join because it's a computed expression of the warehouse column (c_w_id for the customer table, d_w_id for the district table). But that isn't what's happening. Instead, we're still fanning out to all three crdb_region values, but locality-optimized search short-circuits before this actually results in a cross-region RPC. So it hides the latency of a fanout that we don't strictly need to do.

I'm not quite sure how to think about this. On one hand, it would be nice to derive the crdb_region value in this case, like we are able to do in other cases. On the other hand, it's pretty neat that locality-optimized search is a general-purpose tool that can dramatically reduce the need for these kinds of specific optimizations.

Maybe we just open another issue to track the longer-term item of deriving the value of the computed column in these foreign key lookup queries? What do you think?

Do you know what might have happened in this case?

I'm not totally sure. This dataset was generated using IMPORT, so it should have used the pre-calculated stats in workload. Perhaps those have rotted? Or they were discarded during one of the multi-region schema changes I performed. What happens to stats when a new column is added to a table?


Also, as a follow-up to all of this, I switched the database over to REGION survivability and things still look quite good. Here's the result from one region:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1283            2.1    234.9    209.7    453.0    805.3   1543.5  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12375           20.6    282.7    226.5    385.9   1946.2   2952.8  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1276            2.1     13.6      8.4     30.4    130.0    469.8  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          12630           21.0    188.1    151.0    335.5   1409.3   2684.4  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0           1260            2.1     32.4     14.7    113.2    352.3    671.1  stockLevel

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0          28824           48.0    216.3    192.9    335.5   1744.8   2952.8

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  600.0s     1237.5  32.1%    282.7    226.5    260.0    385.9   1946.2   2952.8

We do see p50 latency increase by more than the replication latency (~65ms). I believe this is explained by a few replication pipeline stalls during foreign key checks on recently written rows in some of these transactions. The use of column families, which were added in https://github.com/cockroachdb/cockroach/pull/30624, is known to avoid most of these. Unfortunately, it's not easy to add column families after the fact to confirm that.

rytaft commented 3 years ago

On one hand, it would be nice to derive the crdb_region value in this case, like we are able to do in other cases.... Maybe we just open another issue to track the longer-term item of deriving the value of the computed column in these foreign key lookup queries? What do you think?

Yea, let's open an issue. It's a bit tricky since the foreign key check scans the buffer from the mutation output rather than using the constant values directly, so we'd need to somehow detect cases where we can "inline" the constant values rather than scanning from the buffer. I opened #63882 -- feel free to add any additional color there.

This dataset was generated using IMPORT, so it should have used the pre-calculated stats in workload. Perhaps those have rotted? Or they were discarded during one of the multi-region schema changes I performed. What happens to stats when a new column is added to a table?

They shouldn't ever be discarded. I did a bit of poking around, and it seems like we only inject stats with workload fixtures import tpcc, while workload init tpcc --data-loader IMPORT doesn't do it. Have we mostly abandoned workload fixtures at this point? I think we can get stats injected for workload init, but we just need to set a bool to true somewhere in workload... I can open an issue and make this change, just let me know if you think that makes sense.

rytaft commented 3 years ago

Hey @nvanbenschoten, I'm assigning this issue back to you to open the PR with your diff from above. Also let me know your thoughts about my TPC-C stats question above whenever you get a chance.

nvanbenschoten commented 3 years ago

it seems like we only inject stats with workload fixtures import tpcc, while workload init tpcc --data-loader IMPORT doesn't do it. Have we mostly abandoned workload fixtures at this point? I think we can get stats injected for workload init, but we just need to set a bool to true somewhere in workload... I can open an issue and make this change, just let me know if you think that makes sense.

Good question. I think we still run workload fixtures import tpcc in our nightly roachtests. I don't actually see any uses of workload init --data-loader. But I also think the intention was for these two approaches towards dataset initialization to behave identically, so we probably should be injecting stats in workload init. Opening an issue SGTM!

rytaft commented 3 years ago

Done: #64580. Thanks for the info!