cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.51k stars 3.7k forks source link

unoptimized-query-oracle: missing intermediate cast in computed column #124732

Open cockroach-teamcity opened 1 month ago

cockroach-teamcity commented 1 month ago

roachtest.unoptimized-query-oracle/disable-rules=all/rand-tables failed with artifacts on release-24.1 @ d194358048c92a697d52fe77f52e01e9e7c670d9:

(query_comparison_util.go:389).runOneRoundQueryComparison: . 1646 statements run: expected unoptimized and optimized results to be equal
  []string(
-   {"-"},
+   nil,
  )
sql: SELECT
    tab_851."col\\U000EF8951_6" AS col_2807
FROM
    defaultdb.public.table_3 AS tab_850
    JOIN defaultdb.public."tab?le_1" AS tab_851 ON (tab_850."coL3_0") = (tab_851."col\\U000EF8951_6")
GROUP BY
    tab_851."col\\U000EF8951_6"
ORDER BY
    tab_851."col\\U000EF8951_6" ASC NULLS LAST,
    tab_851."col\\U000EF8951_6" NULLS LAST,
    tab_851."col\\U000EF8951_6" NULLS FIRST,
    tab_851."col\\U000EF8951_6" ASC NULLS FIRST,
    tab_851."col\\U000EF8951_6" NULLS LAST,
    tab_851."col\\U000EF8951_6" ASC NULLS LAST,
    tab_851."col\\U000EF8951_6" NULLS LAST
test artifacts and logs in: /artifacts/unoptimized-query-oracle/disable-rules=all/rand-tables/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #124699 roachtest: unoptimized-query-oracle/disable-rules=all/rand-tables failed [C-test-failure O-roachtest O-robot O-rsg T-sql-queries branch-master release-blocker]

This test on roachdash | Improve this report!

Jira issue: CRDB-39045

yuzefovich commented 3 weeks ago

Reduced repro:

SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;

CREATE TABLE table_1 (col1_6 REGCLASS);

CREATE TABLE table_3 (
  col3_0  OID,
  col3_7  STRING AS (lower(CAST(col3_0 AS STRING))) VIRTUAL,
  PRIMARY KEY (col3_7),
  INDEX (col3_0 ASC)
);

ALTER TABLE table_3 INJECT STATISTICS e'[{"avg_size": 5, "columns": ["col3_7"], "created_at": "2000-01-01 00:00:00+00:00", "distinct_count": 201038037, "histo_buckets": [{"distinct_range": 0, "num_eq": 700, "num_range": 0, "upper_bound": "\\u0000"}, {"distinct_range": 0, "num_eq": 3066039884274213925, "num_range": 50, "upper_bound": "I6\\u0017="}, {"distinct_range": 40000000000, "num_eq": 2000, "num_range": 40000000000, "upper_bound": "k_Y~y;"}, {"distinct_range": 0, "num_eq": 9000000000, "num_range": 4958652185256911015, "upper_bound": "\\u0008<\\r2]J\\u000e"}, {"distinct_range": 0, "num_eq": 600, "num_range": 100000, "upper_bound": "8R`tYH^"}, {"distinct_range": 1640102296691195600, "num_eq": 80000000000, "num_range": 1640102296691195695, "upper_bound": "\\u000f\\\\[Mm|c\\r"}, {"distinct_range": 242047641220621220, "num_eq": 8503850823594830229, "num_range": 3633976948439898226, "upper_bound": "\\u0018s5iH\'4\\""}, {"distinct_range": 0, "num_eq": 5000000, "num_range": 5380871138879966725, "upper_bound": "Ld\\u0011\\u0006\\u001b\\u000br3"}, {"distinct_range": 6675431471957301000, "num_eq": 300000, "num_range": 6675431471957300813, "upper_bound": "\\u000e4 \\u000b2Yz\\u001ay"}], "histo_col_type": "STRING", "histo_version": 3, "name": "__auto__", "null_count": 0, "row_count": 800000000}]';

INSERT INTO table_1 (col1_6) VALUES (0);
INSERT INTO table_3 (col3_0) VALUES (0);

SET testing_optimizer_random_seed = 2965474011313781851;
SET testing_optimizer_disable_rule_probability = 1.000000;

SELECT col1_6 FROM table_3 JOIN table_1 ON col3_0 = col1_6;

RESET testing_optimizer_random_seed;
RESET testing_optimizer_disable_rule_probability;

SELECT col1_6 FROM table_3 JOIN table_1 ON col3_0 = col1_6;

This should return a single row, yet the latter returns none. The bug is not present on 23.2.5. Using value 1 instead of 0 in the rows removes the bug, so this seems like a regression due to zero oid handling #123290.

michae2 commented 3 weeks ago

[triage] We think col3_0 = col1_6 is somehow using an implicit cast to a string on one column, and that's why the predicate is false.

yuzefovich commented 3 weeks ago

Indeed, I think our hypothesis from the triage was correct. Here is the relevant snippet from EXPLAIN (TYPES, VERBOSE):

  └── • project
      │ columns: (col3_0 oid, col1_6 regclass)
      │
      └── • lookup join (inner)
          │ columns: (col3_7_eq string, col1_6 regclass, col3_0 oid)
          │ estimated row count: 990 (missing stats)
          │ table: table_3@table_3_col3_0_idx
          │ equality: (col1_6, col3_7_eq) = (col3_0, col3_7)
          │ equality cols are key
          │
          └── • render
              │ columns: (col3_7_eq string, col1_6 regclass)
              │ render col3_7_eq: (lower(((col1_6)[regclass]::STRING)[string]))[string]
              │ render col1_6: (col1_6)[regclass]

What happens is that we have an equality on col3_0 = col1_6. Then when we're performing the lookup into table_3_col3_0_idx, we synthesize value of col3_7 virtual column since it's the PK. In the constraint builder we find that we can use value of col1_6 to evaluate the expression for col3_7. Crucially, in the computed column expression we used OID type as the input, but here we're using REGCLASS, which behaves differently when casting to STRING. In other words, it seems to me that we need to add an intermediate cast so that we have col3_7_eq be lower(col1_6::OID::STRING) instead of lower(col1_6::STRING). I'll leave this to optimizer experts to figure out further.

DrewKimball commented 2 weeks ago

[triage] We may need to be adding casts when types are identical, rather than just equivalent.

mgartner commented 4 days ago

Further reduced this:

CREATE TABLE table_1 (col1_6 REGCLASS);

CREATE TABLE table_3 (
  col3_0 OID,
  col3_7 STRING AS (col3_0::STRING) VIRTUAL PRIMARY KEY
);

INSERT INTO table_1 (col1_6) VALUES (0);
INSERT INTO table_3 (col3_0) VALUES (0);

SELECT col1_6 FROM table_1 INNER HASH JOIN table_3 ON col3_0 = col1_6;

SELECT col1_6 FROM table_1 INNER LOOKUP JOIN table_3 ON col3_0 = col1_6;
DrewKimball commented 4 days ago

We ended up looking at this during the collab session last week - seems like we're just missing some logic to check if column types are identical before remapping column references. See also https://github.com/cockroachdb/cockroach/pull/126345