cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

opt: internal error: top-level relational expression cannot have outer columns #130400

Open ycybfhb opened 1 month ago

ycybfhb commented 1 month ago

Describe the problem

internal error: top-level relational expression cannot have outer columns in makeQueryIndexRecommendation()->Optimize()

To Reproduce

SQL to init database [init.sql.txt](https://github.com/user-attachments/files/16941629/init.sql.txt)
SQL that causes error ```sql select ref_9.c__sm6t8__m as c0, case when (EXISTS ( select (select c11 from t_rca order by c11 limit 1 offset 5) as c0, ref_13.c_ibhkt4ifda as c1, ref_0.c7 as c2, ref_3.c10 as c3, ref_9.c__sm6t8__m as c4, ref_14.c5 as c5 from (t_bnqfnwa as ref_13 inner join t_rca as ref_14 on (((ref_13.c_uuw5d832 not like 's%xqbz4c')) or ((cast((cast(ref_14.c4 as int8) > cast(ref_14.c4 as int8)) as bool))))) where (cast((cast(null as "timestamptz") < cast(null as "timestamp")) as bool)) limit 132)) then ref_6.c_sc9oe0qu else ref_6.c_jt8ydrx7 end as c1, ref_4.c5 as c2, (select c_i9hbg1 from t_ow0g_ order by c_i9hbg1 limit 1 offset 6) as c3, case when true then (select c_rqlsk_aj from t_bnqfnwa order by c_rqlsk_aj limit 1 offset 2) else ref_4.c0 end as c4, ref_0.c6 as c5, ref_1.c_d38l4v64 as c6, 820951976 as c7, cast((cast(cast(null as record) as record) != cast(case when (ref_6.c_l1 not like 'c_zz%a9z') then cast(cast(null as record) as record) else cast(cast(null as record) as record) end as record)) as bool) as c8, ref_6.c_ywed as c9 from ((((t_rca as ref_0 cross join t_bnqfnwa as ref_1 ) left outer join (t_rca as ref_2 inner join t_rca as ref_3 on (true)) on ((ref_1.c_ts7n between ref_1.c_veh and ref_2.c0))) cross join (t_rca as ref_4 right outer join (t_bnqfnwa as ref_5 cross join t_y as ref_6 ) on ((ref_5.c_d38l4v64 >= ( select ref_7.c_ca as c0 from t_y as ref_7 where (cast((cast(cast(null as "interval") as "interval") IS DISTINCT FROM cast(cast(null as "interval") as "interval")) as bool)) union ( select ref_5.c_uuw5d832 as c0 from t_rca as ref_8 where (cast((cast(null as date) IS DISTINCT FROM cast(null as "timestamp")) as bool)) ) limit 1)))) ) right outer join (t_ow0g_ as ref_9 full outer join t_bnqfnwa as ref_10 on ((cast((cast(ref_10.c_veh as int8) <= cast(cast(null as oid) as oid)) as bool)))) on ((ref_2.c3 is not NULL))) where (case when ((ref_6.c_m50a like 'k%')) or ((ref_0.c8 in ( select ref_12.c_veh as c0 from (t_bnqfnwa as ref_11 right outer join t_bnqfnwa as ref_12 on ((ref_11.c_rqlsk_aj between ref_11.c_rqlsk_aj and ref_11.c_rqlsk_aj))) where (cast((cast(ref_12.c_veh as int8) >= cast(cast(null as "numeric") as "numeric")) as bool))))) then ref_0.c11 else ref_6.c_hdnu9xzja end is NULL) union ( select (EXISTS ( select cast((cast(cast(null as oid) as oid) != cast(ref_16.c_rqlsk_aj as int8)) as bool) as c0, ref_16.c_d38l4v64 as c1, ref_16.c_ibhkt4ifda as c2, ref_16.c_veh as c3, -888349915 as c4, subq_0.c10 as c5, ref_16.c_uuw5d832 as c6, ref_16.c_d38l4v64 as c7, subq_0.c0 as c8, pg_catalog.postgis_scripts_released() as c9 from t_bnqfnwa as ref_16 where (case when (ref_16.c_ts7n = ( select ref_17.c_ywed as c0 from t_y as ref_17 where (cast((cast(subq_0.c6 as int8) IS DISTINCT FROM cast(cast(null as oid) as oid)) as bool)) union all ( select ref_16.c_ts7n as c0 from t_ow0g_ as ref_18 where false ) limit 1)) then ref_16.c_ts7n else subq_0.c1 end between subq_0.c1 and subq_0.c1) limit 100)) as c0, subq_0.c6 as c1, subq_0.c6 as c2, subq_0.c1 as c3, 1368683964 as c4, cast(null as text) as c5, case when (EXISTS ( select ref_19.c_n5 as c0 from t_ow0g_ as ref_19 where (ref_19.c_deesne_t >= ( select ref_19.c_deesne_t as c0 from t_ow0g_ as ref_20 where (EXISTS ( select ref_20.c_vj_t3epb as c0, (select c_n5 from t_ow0g_ order by c_n5 limit 1 offset 1) as c1, ref_21.c_rqlsk_aj as c2, ref_21.c_hl3s8ojsmc as c3, ref_21.c_ts7n as c4, ref_21.c_uuw5d832 as c5, ref_21.c_veh as c6 from t_bnqfnwa as ref_21 where (EXISTS ( select ref_22.c__sm6t8__m as c0, 16 as c1, subq_0.c9 as c2, ref_22.c_deesne_t as c3, (select c_g9f9abs2 from t_y order by c_g9f9abs2 limit 1 offset 4) as c4, ref_22.c__sm6t8__m as c5, ref_21.c_rqlsk_aj as c6, ref_22.c_i9hbg1 as c7, subq_0.c5 as c8, ref_22.c_n5 as c9, ref_22.c_gdyk as c10, ref_22.c_i9hbg1 as c11, ref_22.c_gdyk as c12, ref_22.c_i9hbg1 as c13, cast(null as int4) as c14, subq_0.c2 as c15, ref_22.c_vj_t3epb as c16, ref_22.c_n5 as c17, subq_0.c1 as c18, ref_22.c_i9hbg1 as c19, ref_22.c_gdyk as c20, -1201652844 as c21, (select c_l1 from t_y order by c_l1 limit 1 offset 2) as c22, (select c_n5 from t_ow0g_ order by c_n5 limit 1 offset 2) as c23, ref_22.c_n5 as c24, ref_21.c_veh as c25, ref_20.c_n5 as c26, ref_22.c_gdyk as c27, ref_21.c_ts7n as c28, ref_22.c_gdyk as c29, ref_22.c_i9hbg1 as c30, ref_22.c_gdyk as c31, ref_22.c_ie as c32, subq_0.c10 as c33, ref_22.c_deesne_t as c34, ref_22.c_deesne_t as c35, ref_22.c_n5 as c36, ref_22.c_n5 as c37, ref_22.c_n5 as c38, ref_22.c__sm6t8__m as c39, ref_22.c_ie as c40, 1139846702 as c41, ref_22.c_vj_t3epb as c42, ref_21.c_veh as c43, ref_22.c_gdyk as c44, (select c7 from t_rca order by c7 limit 1 offset 6) as c45, ref_19.c_deesne_t as c46, ref_20.c_i9hbg1 as c47, subq_0.c2 as c48, ref_22.c_deesne_t as c49, ref_22.c_ie as c50, ref_22.c_ie as c51, subq_0.c1 as c52, ref_20.c_deesne_t as c53, ref_22.c__sm6t8__m as c54, ref_22.c_vj_t3epb as c55, subq_0.c10 as c56, ref_22.c_i9hbg1 as c57, ref_21.c_veh as c58, ref_19.c_i9hbg1 as c59, subq_0.c0 as c60, ref_22.c_gdyk as c61, ref_22.c_ie as c62, subq_0.c6 as c63, ref_22.c_n5 as c64, ref_22.c_i9hbg1 as c65, ref_22.c_deesne_t as c66, ref_22.c_ie as c67, ref_22.c__sm6t8__m as c68, ref_22.c_i9hbg1 as c69, ref_20.c_deesne_t as c70, ref_22.c_n5 as c71, ref_22.c_vj_t3epb as c72, ref_22.c_ie as c73, ref_20.c_vj_t3epb as c74, (select c_ca from t_y order by c_ca limit 1 offset 1) as c75, ref_22.c_deesne_t as c76, subq_0.c6 as c77, ref_19.c_i9hbg1 as c78, ref_22.c_deesne_t as c79, ref_22.c__sm6t8__m as c80, ref_22.c_deesne_t as c81, ref_21.c_ts7n as c82, cast(null as int4) as c83, ref_22.c_ie as c84, (select c_ibhkt4ifda from t_bnqfnwa order by c_ibhkt4ifda limit 1 offset 5) as c85, ref_21.c_uuw5d832 as c86, ref_22.c_i9hbg1 as c87, ref_22.c_n5 as c88, ref_22.c_ie as c89, ref_22.c_vj_t3epb as c90, ref_22.c_n5 as c91, subq_0.c7 as c92, ref_22.c_n5 as c93, ref_22.c__sm6t8__m as c94, subq_0.c6 as c95, ref_20.c__sm6t8__m as c96, subq_0.c3 as c97, subq_0.c2 as c98, ref_22.c_vj_t3epb as c99 from t_ow0g_ as ref_22 where (cast((cast(null as "timestamptz") > cast(null as date)) as bool)))))) limit 1)))) then cast(null as text) else cast((cast(cast(null as "geometry") as "geometry") || cast('e3cnh' as text)) as text) end as c6, subq_0.c1 as c7, ((subq_0.c8 is not NULL)) and (true) as c8, subq_0.c1 as c9 from (select ref_15.c_hdnu9xzja as c0, ref_15.c_b3l3rlrs as c1, ref_15.c_t9r1kl9y6m as c2, ref_15.c_hdnu9xzja as c3, ref_15.c_f53cae_5zm as c4, ref_15.c_f53cae_5zm as c5, ref_15.c_jqnxje42v as c6, ref_15.c_et6rlu as c7, ref_15.c_bxpz2g5 as c8, ref_15.c_g9f9abs2 as c9, cast((cast(cast(null as "box2d") as "box2d") > cast(cast(null as "box2d") as "box2d")) as bool) as c10 from t_y as ref_15 where (ref_15.c_ca like 'oz_') limit 120) as subq_0 where (pg_catalog.unique_rowid() in ( cast(subq_0.c2 as int8))) ) limit 90; ```

Expected behavior

Expect no crashes

Additional data / screenshots

Here is the crash stack trace:

Hook early check failed: Database execute error: XX000, ERROR: internal error: top-level relational expression cannot have outer columns: (311)
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:280: Optimize()
github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:786: makeQueryIndexRecommendation()
github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:612: buildExecMemo()
github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:250: makeOptimizerPlan()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:2363: makeExecPlan()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1833: dispatchToExecutionEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1166: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:146: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:3491: execWithProfiling()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:145: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2290: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2295: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2212: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:966: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:247: processCommands()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:992: func3()
src/runtime/asm_amd64.s:1695: goexit()
;
Error while executing the query
extra info: cost: 553ms

Environment:

docker compose file:

version: '3.5'

services:
  crdb1:
    image: cockroachdb/cockroach:v24.1.0
    command: "start --advertise-addr=crdb1:26357 --http-addr=crdb1:8080 --listen-addr=crdb1:26357 --sql-addr=crdb1:26257 --insecure --join=crdb1:26357,crdb2:26357,crdb3:26357"
    volumes:
      - "crdb1_data:/cockroach/cockroach-data"
    networks:
      crnet:
        ipv4_address: 10.0.5.11

  crdb2:
    image: cockroachdb/cockroach:v24.1.0
    command: "start --advertise-addr=crdb2:26357 --http-addr=crdb2:8080 --listen-addr=crdb2:26357 --sql-addr=crdb2:26257 --insecure --join=crdb1:26357,crdb2:26357,crdb3:26357"
    volumes:
      - "crdb2_data:/cockroach/cockroach-data"
    networks:
      crnet:
        ipv4_address: 10.0.5.12

  crdb3:
    image: cockroachdb/cockroach:v24.1.0
    command: "start --advertise-addr=crdb3:26357 --http-addr=crdb3:8080 --listen-addr=crdb3:26357 --sql-addr=crdb3:26257 --insecure --join=crdb1:26357,crdb2:26357,crdb3:26357"
    volumes:
      - "crdb3_data:/cockroach/cockroach-data"
    networks:
      crnet:
        ipv4_address: 10.0.5.13

  crdb_single:
    image: cockroachdb/cockroach:v24.1.0

    command: start-single-node --insecure
    volumes:
      - "crdb_single_data:/cockroach/cockroach-data"
    networks:
      crnet:
        ipv4_address: 10.0.5.21

volumes:
  crdb1_data:
  crdb2_data:
  crdb3_data:
  crdb_single_data:

networks:
  crnet:
    driver: bridge
    ipam:
      config:
        - subnet: 10.0.5.0/24
          gateway: 10.0.5.254

about us

We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the above-mentioned possible vulnerabilities that may lead to database crashes.

Jira issue: CRDB-42034

blathers-crl[bot] commented 1 month ago

Hi @ycybfhb, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] commented 1 month ago

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

michae2 commented 1 month ago

Initial reduction:

CREATE TABLE t_y (c_b3l3rlrs INT4, c_bxpz2g5 BOOL, c_jqnxje42v INT4);

CREATE TABLE t_bnqfnwa (c_ts7n INT4);

SELECT
  EXISTS(
    SELECT 1
    FROM
      t_bnqfnwa AS ref_16
    WHERE
      CASE WHEN ref_16.c_ts7n = (SELECT NULL WHERE subq_0.c6 IS DISTINCT FROM NULL UNION ALL SELECT ref_16.c_ts7n WHERE false) THEN ref_16.c_ts7n ELSE subq_0.c1 END BETWEEN subq_0.c1 AND subq_0.c1
  )
FROM
  (SELECT ref_15.c_b3l3rlrs AS c1, ref_15.c_jqnxje42v AS c6, ref_15.c_bxpz2g5 AS c8 FROM t_y AS ref_15) AS subq_0;

Looks like this is somewhat recent: it seems to have started with v23.2.

michae2 commented 1 month ago

Here's the stack on master:

ERROR: internal error: top-level relational expression cannot have outer columns: (14)
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:281: Optimize()
github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:883: buildExecMemo()
github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:251: makeOptimizerPlan()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:2298: makeExecPlan()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1848: dispatchToExecutionEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1181: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:146: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:3436: execWithProfiling()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:145: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2320: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2325: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2242: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:961: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:256: processCommands()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:1136: func4()
src/runtime/asm_arm64.s:1222: goexit()
michae2 commented 1 month ago

P-2 because this is recent and we've also seen this error message in sentry reports and unoptimized-query-oracle.

mgartner commented 1 month ago

Reduced opt test repro:

exec-ddl
CREATE TABLE ab (
  a INT,
  b INT
)
----

exec-ddl
CREATE TABLE d (
  d INT
)
----

opt
SELECT EXISTS(
  SELECT 1
  FROM d
  WHERE (
    CASE
      WHEN d = (
          SELECT NULL WHERE b IS DISTINCT FROM NULL UNION ALL SELECT d WHERE false
      ) THEN d
      ELSE a
    END
  ) BETWEEN a AND a
)
FROM ab
----
mgartner commented 1 month ago

The problem is that the SELECT NULL WHERE b IS DISTINCT FROM NULL UNION ALL SELECT d WHERE false subquery is duplicated when we build the .. BETWEEN a AND a expression into two expressions, a less-than and a greater-than expression. Later on, the subquery is hoisted twice resulting in a join(s) with the same column IDs on both sides of the join.

mgartner commented 1 month ago

This is similar to other problems with have with duplicate subqueries causing overlapping columns, like #116022. I haven't come up with an easy, comprehensive fix yet.

One potential fix for this issue is to keep track of hoisted subqueries and prevent the optimizer from hoisting a subquery more than once.

mgartner commented 1 month ago

This reproduces back to v23.2. I'll investigate why it doesn't reproduce on v23.1.

mgartner commented 1 month ago

The query causes optimizer test-only assertions to fail on v23.1 (and probably older versions as well), but it successfully executes in a release build, though its not clear if the query results are correct.

I'll move to "Bugs to Fix" since this has been an issue for a while.