cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

sql: v23.1.2: nil pointer in opt.(*Metadata).TableAnnotation #104009

Open cockroach-teamcity opened 1 year ago

cockroach-teamcity commented 1 year ago

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://cockroach-labs.sentry.io/issues/4212208041/?referrer=webhooks_plugin

Panic message:

catch.go:29: runtime error: invalid memory address or nil pointer dereference (1) assertion failure Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/util/errorutil.ShouldCatch | github.com/cockroachdb/cockroach/pkg/util/errorutil/catch.go:29 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).Optimize.func1 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:248 | runtime.gopanic | GOROOT/src/runtime/panic.go:884 | runtime.panicmem | GOROOT/src/runtime/panic.go:260 | runtime.sigpanic | GOROOT/src/runtime/signal_unix.go:835 | github.com/cockroachdb/cockroach/pkg/sql/opt.(Metadata).TableAnnotation | github.com/cockroachdb/cockroach/pkg/sql/opt/table_meta.go:552 | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(statisticsBuilder).makeTableStatistics | github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:589 | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(statisticsBuilder).availabilityFromInput | github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:283 | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(statisticsBuilder).buildScan | github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:793 | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(logicalPropsBuilder).buildScanProps | github.com/cockroachdb/cockroach/pkg/sql/opt/memo/logical_props_builder.go:186 | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(Memo).MemoizeScan | github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/sql/opt/memo/expr.og.go:19992 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(CustomFuncs).generateLookupJoinsImpl | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/join_funcs.go:400 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(CustomFuncs).GenerateLookupJoins | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/join_funcs.go:261 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(explorer).exploreInnerJoin | github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/sql/opt/xform/explorer.og.go:830 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(explorer).exploreGroupMember | github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/sql/opt/xform/explorer.og.go:28 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(explorer).exploreGroup | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/explorer.go:185 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeGroup | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:536 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:299 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeGroupMember | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:578 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeGroup | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:523 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:299 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeGroupMember | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:578 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeGroup | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:523 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:299 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeGroupMember | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:578 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeGroup | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:523 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:299 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeScalarExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:621 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:311 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeScalarExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:621 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:311 | github.com/cockroachdb/cockroach/pkg/sql/opt/xform.(Optimizer).optimizeScalarExpr | github.com/cockroachdb/cockroach/pkg/sql/opt/xform/optimizer.go:621 Wraps: (3) runtime error: invalid memory address or nil pointer dereference | -- cause hidden behind barrier | runtime error: invalid memory address or nil pointer dereference | (1) runtime error: invalid memory address or nil pointer dereference | Error types: (1) runtime.errorString Error types: (1) assert.withAssertionFailure (2) withstack.withStack (3) barriers.barrierErr -- report composition: barriers.barrierErr: masked error: runtime error: invalid memory address or nil pointer dereference catch.go:29: withstack.withStack (top exception) assert.withAssertionFailure

Stacktrace (expand for inline code snippets): https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/util/errorutil/catch.go#L28-L30 in pkg/util/errorutil.ShouldCatch https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L247-L249 in pkg/sql/opt/xform.(*Optimizer).Optimize.func1 GOROOT/src/runtime/panic.go#L883-L885 in runtime.gopanic GOROOT/src/runtime/panic.go#L259-L261 in runtime.panicmem GOROOT/src/runtime/signal_unix.go#L834-L836 in runtime.sigpanic https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/table_meta.go#L551-L553 in pkg/sql/opt.(*Metadata).TableAnnotation https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/memo/statistics_builder.go#L588-L590 in pkg/sql/opt/memo.(*statisticsBuilder).makeTableStatistics https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/memo/statistics_builder.go#L282-L284 in pkg/sql/opt/memo.(*statisticsBuilder).availabilityFromInput https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/memo/statistics_builder.go#L792-L794 in pkg/sql/opt/memo.(*statisticsBuilder).buildScan https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/memo/logical_props_builder.go#L185-L187 in pkg/sql/opt/memo.(*logicalPropsBuilder).buildScanProps https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/bazel-out/k8-opt/bin/pkg/sql/opt/memo/expr.og.go#L19991-L19993 in pkg/sql/opt/memo.(*Memo).MemoizeScan https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/join_funcs.go#L399-L401 in pkg/sql/opt/xform.(*CustomFuncs).generateLookupJoinsImpl https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/join_funcs.go#L260-L262 in pkg/sql/opt/xform.(*CustomFuncs).GenerateLookupJoins https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/bazel-out/k8-opt/bin/pkg/sql/opt/xform/explorer.og.go#L829-L831 in pkg/sql/opt/xform.(*explorer).exploreInnerJoin https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/bazel-out/k8-opt/bin/pkg/sql/opt/xform/explorer.og.go#L27-L29 in pkg/sql/opt/xform.(*explorer).exploreGroupMember https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/explorer.go#L184-L186 in pkg/sql/opt/xform.(*explorer).exploreGroup https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L535-L537 in pkg/sql/opt/xform.(*Optimizer).optimizeGroup https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L298-L300 in pkg/sql/opt/xform.(*Optimizer).optimizeExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L577-L579 in pkg/sql/opt/xform.(*Optimizer).optimizeGroupMember https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L522-L524 in pkg/sql/opt/xform.(*Optimizer).optimizeGroup https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L298-L300 in pkg/sql/opt/xform.(*Optimizer).optimizeExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L577-L579 in pkg/sql/opt/xform.(*Optimizer).optimizeGroupMember https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L522-L524 in pkg/sql/opt/xform.(*Optimizer).optimizeGroup https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L298-L300 in pkg/sql/opt/xform.(*Optimizer).optimizeExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L577-L579 in pkg/sql/opt/xform.(*Optimizer).optimizeGroupMember https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L522-L524 in pkg/sql/opt/xform.(*Optimizer).optimizeGroup https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L298-L300 in pkg/sql/opt/xform.(*Optimizer).optimizeExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L620-L622 in pkg/sql/opt/xform.(*Optimizer).optimizeScalarExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L310-L312 in pkg/sql/opt/xform.(*Optimizer).optimizeExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L620-L622 in pkg/sql/opt/xform.(*Optimizer).optimizeScalarExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L310-L312 in pkg/sql/opt/xform.(*Optimizer).optimizeExpr https://github.com/cockroachdb/cockroach/blob/810d4f27a7f02b9cc2750cab654ed1c62ac3e75a/pkg/sql/opt/xform/optimizer.go#L620-L622 in pkg/sql/opt/xform.(*Optimizer).optimizeScalarExpr
pkg/util/errorutil/catch.go in pkg/util/errorutil.ShouldCatch at line 29
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).Optimize.func1 at line 248
GOROOT/src/runtime/panic.go in runtime.gopanic at line 884
GOROOT/src/runtime/panic.go in runtime.panicmem at line 260
GOROOT/src/runtime/signal_unix.go in runtime.sigpanic at line 835
pkg/sql/opt/table_meta.go in pkg/sql/opt.(*Metadata).TableAnnotation at line 552
pkg/sql/opt/memo/statistics_builder.go in pkg/sql/opt/memo.(*statisticsBuilder).makeTableStatistics at line 589
pkg/sql/opt/memo/statistics_builder.go in pkg/sql/opt/memo.(*statisticsBuilder).availabilityFromInput at line 283
pkg/sql/opt/memo/statistics_builder.go in pkg/sql/opt/memo.(*statisticsBuilder).buildScan at line 793
pkg/sql/opt/memo/logical_props_builder.go in pkg/sql/opt/memo.(*logicalPropsBuilder).buildScanProps at line 186
bazel-out/k8-opt/bin/pkg/sql/opt/memo/expr.og.go in pkg/sql/opt/memo.(*Memo).MemoizeScan at line 19992
pkg/sql/opt/xform/join_funcs.go in pkg/sql/opt/xform.(*CustomFuncs).generateLookupJoinsImpl at line 400
pkg/sql/opt/xform/join_funcs.go in pkg/sql/opt/xform.(*CustomFuncs).GenerateLookupJoins at line 261
bazel-out/k8-opt/bin/pkg/sql/opt/xform/explorer.og.go in pkg/sql/opt/xform.(*explorer).exploreInnerJoin at line 830
bazel-out/k8-opt/bin/pkg/sql/opt/xform/explorer.og.go in pkg/sql/opt/xform.(*explorer).exploreGroupMember at line 28
pkg/sql/opt/xform/explorer.go in pkg/sql/opt/xform.(*explorer).exploreGroup at line 185
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeGroup at line 536
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeExpr at line 299
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeGroupMember at line 578
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeGroup at line 523
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeExpr at line 299
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeGroupMember at line 578
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeGroup at line 523
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeExpr at line 299
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeGroupMember at line 578
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeGroup at line 523
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeExpr at line 299
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeScalarExpr at line 621
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeExpr at line 311
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeScalarExpr at line 621
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeExpr at line 311
pkg/sql/opt/xform/optimizer.go in pkg/sql/opt/xform.(*Optimizer).optimizeScalarExpr at line 621
Tag Value
Cockroach Release v23.1.2
Cockroach SHA: 810d4f27a7f02b9cc2750cab654ed1c62ac3e75a
Platform linux amd64
Distribution CCL
Environment v23.1.2
Command start-single-node
Go Version ``
# of CPUs
# of Goroutines

Jira issue: CRDB-28306

mgartner commented 1 year ago

It looks like sb.md is nil here: https://github.com/cockroachdb/cockroach/blob/1698b4960e4576de14d8fe0d5d32eacf934f5bb4/pkg/sql/opt/memo/statistics_builder.go#L589

This is possible if *Memo.SetRoot is called before this point. I haven't figured out a reproduction yet. However, notice these comments, which may be relevant:

https://github.com/cockroachdb/cockroach/blob/e9a1c0be388f4d9908e43ce91f7bbe4b701da3c7/pkg/sql/opt/memo/memo.go#L456-L459

https://github.com/cockroachdb/cockroach/blob/e9a1c0be388f4d9908e43ce91f7bbe4b701da3c7/pkg/sql/opt/memo/memo.go#L479-L481

msirek commented 1 year ago

I noticed a problem where ReleaseMemo() was causing Factory.mem to be nil'ed out, causing a similar problem, and I resolved it like this in my local branch: https://github.com/msirek/cockroach/blob/b3427f27bfc5f2994eb984a5c58e13619ff5101a/pkg/sql/opt/norm/factory.go#L180-L181

It may not be the same root cause, but maybe this is also a released memo case, where calling Memo.Init will make sure that the logPropsBuilder remains valid.

mgartner commented 1 year ago

Let's guard against the nil and log this case and/or report to Sentry so we can try to diagnose if it happens again.

liningpan commented 1 year ago

I encountered this error when running the diesel print-schema command from the rust diesel framework on a freshly created single node database. I wonder if information schema related queries could trigger this.

mgartner commented 1 year ago

@liningpan thanks for letting us know! We'll try to reproduce this. If there are any other steps you performed to encounter this bug, please let us know.

liningpan commented 1 year ago

The issue appears to be transient, and I can only reproduce the issue immediately after deleting a database and creating a new one with two table. The problem will go away after a few minutes.

The query caused the problem is SELECT obj_description($1::REGCLASS, $2), but I cannot manually trigger this with PREPARE/EXECUTE. Only diesel_cli can reliably trigger the error, which I believe uses libpq.

rharding6373 commented 1 year ago

I was unable to reproduce this with the following steps:

Install rust/diesel:

brew install rust
brew install diesel

Start up a single node cockroach cluster:

cockroach-v23.1.2.darwin-10.9-amd64/cockroach start-single-node --insecure
cockroach-v23.1.2.darwin-10.9-amd64/cockroach sql --insecure

Create, delete, and recreate a database with tables in the SQL shell:

create database db;
create table db.t (a int); 
create table db.s (a int); 
drop database db cascade;
create database db;
create table db.t (a int); 
create table db.s (a int); 

Immediately diesel print-schema:

diesel print-schema --database-url=[cockroach sql url to database db]
// @generated automatically by Diesel CLI.

diesel::table! {
    s (rowid) {
        a -> Nullable<Int8>,
        rowid -> Int8,
    }
}

diesel::table! {
    t (rowid) {
        a -> Nullable<Int8>,
        rowid -> Int8,
    }
}

diesel::allow_tables_to_appear_in_same_query!(
    s,
    t,
);

@liningpan Do these steps mimic what you were doing to recreate the issue? Do you have any other tips that could help us reproduce this? Thanks in advance.

msirek commented 1 year ago

I tried the same steps as @rharding6373 on a gceworker (except for different rust and diesel installation steps), and also could not reproduce the problem.

@liningpan if you could provide the DDL of the tables you are creating or any other other details, that might help us reproduce the issue.

rharding6373 commented 1 year ago

Let's implement the safeguard suggested in https://github.com/cockroachdb/cockroach/issues/104009#issuecomment-1579288607 and then close this issue, since we don't have a reproduction.

mgartner commented 1 year ago

I'm not sure how I came to the conclusion that sb.md is nil: https://github.com/cockroachdb/cockroach/issues/104009#issuecomment-1572578619

Maybe it's md.tables or md.tables[...].anns?

liningpan commented 1 year ago

Hi! Sorry for the late response. I've been trying to create a minimal example, and here it is https://github.com/liningpan/cockroach-null-ptr.

This example is adapted from one of the unit tests in diesel, and a notable difference is that it is executed in a test transaction.

https://github.com/diesel-rs/diesel/blob/99c01cf4c80b82fe83d7118519e60829af71d895/diesel_cli/src/infer_schema_internals/pg.rs#L296

With the rust binary program I linked above, I can reliably trigger the nil pointer dereference error both on my M2 Mac and an x86 Linux machine on a fresh single node cockroach-v23.1.5.

rm -rf cockroach-data
cockroach start-single-node --insecure
DATABASE_URL='postgresql://root@localhost:26257/defaultdb?sslmode=disable' cargo run

Let me know if this helps you guys reproduce the issues! To me, it seems like some sort of race condition or things not being initialized properly.

rharding6373 commented 1 year ago

Thanks for going through all the effort to repro this for us, @liningpan! I've managed to repro this reliably using these steps.

rafiss commented 1 year ago

I used Wireshark to capture the client-server traffic. Here is a repro; we can add this to a file in pkg/sql/pgwire/testdata/pgtest/ and then run with ./dev test pkg/sql/pgwire -f=TestPGTest/file_name.

send
Query {"String": "DROP TABLE IF EXISTS test_schema.table_1 CASCADE"}
Query {"String": "DROP SCHEMA IF EXISTS test_schema CASCADE"}
Query {"String": "CREATE SCHEMA test_schema"}
Query {"String": "CREATE TABLE test_schema.table_1 (id SERIAL PRIMARY KEY, text_col VARCHAR, not_null TEXT NOT NULL)"}
Query {"String": "COMMENT ON TABLE test_schema.table_1 IS 'table comment'"}
----

until ignore=NoticeResponse
ReadyForQuery
ReadyForQuery
ReadyForQuery
ReadyForQuery
ReadyForQuery
----
{"Type":"CommandComplete","CommandTag":"DROP TABLE"}
{"Type":"ReadyForQuery","TxStatus":"I"}
{"Type":"CommandComplete","CommandTag":"DROP SCHEMA"}
{"Type":"ReadyForQuery","TxStatus":"I"}
{"Type":"CommandComplete","CommandTag":"CREATE SCHEMA"}
{"Type":"ReadyForQuery","TxStatus":"I"}
{"Type":"CommandComplete","CommandTag":"CREATE TABLE"}
{"Type":"ReadyForQuery","TxStatus":"I"}
{"Type":"CommandComplete","CommandTag":"COMMENT ON TABLE"}
{"Type":"ReadyForQuery","TxStatus":"I"}

send
Parse {"Query": "SELECT obj_description($1::regclass, $2)", "ParameterOIDs": [25, 25]}
Bind {"ParameterFormatCodes": [1,1], "Parameters": [{"binary":"746573745f736368656d612e7461626c655f31"}, {"binary":"70675f636c617373"}]}
Execute
Sync
----

until
ReadyForQuery
----
{"Type":"ParseComplete"}
{"Type":"BindComplete"}
{"Type":"DataRow","Values":[{"text":"table comment"}]}
{"Type":"CommandComplete","CommandTag":"SELECT 1"}
{"Type":"ReadyForQuery","TxStatus":"I"}

Looks like it only happens when using a prepared statement with binary-encoded parameters, and it might be specific to obj_description.

mgartner commented 1 year ago

Thanks @rafiss! I should be able to diagnose the problem from here.

mgartner commented 1 year ago

Here's a miminized reproduction that works without any specific pgwire encoding:

CREATE TABLE d (d1 STRING);
CREATE TABLE c (c1 STRING, INDEX (c1));
CREATE FUNCTION f(x STRING) RETURNS STRING STABLE LANGUAGE SQL AS $$
  SELECT 1 FROM d JOIN c ON d.d1 = c.c1 WHERE d.d1 = x
$$;

PREPARE p AS SELECT f($1::INT::STRING);

EXECUTE p(1);

The events/state that lead to the nil-pointer exception are:

  1. During PREPARE the memo is "detached" and some pointers are set to nil to allow the GC to clean up some objects. In this specific reproduction, resetting the memo's logicalPropsBuilder is critical because this contains a pointer back to the memo. This pointer is set to nil during detachment.
  2. The built expression groups (e.g., scanGroup) have a pointer to the memo that has been detached.
  3. During PREPARE the UDF is not inlined because it can only be inlined when it has constant, variable, or placeholder expressions. The double cast $1::INT::STRING in the reproduction above prevents inlining in this case.
  4. During EXECUTE, the UDF is inlined when the casts are folded to a constant value. The inlined subquery contains expressions copied from the cached, prepared memo. Some expressions are not rebuilt because they do not contain placeholder expressions. They retain references to the detached memo.
  5. While the expression is still being optimized during EXECUTE, the original cached memo is accessed via the Memo() method of some expressions that were not rebuilt. Fields of the cached memo, which were set to nil during detachment, are accessed and cause nil-pointer exceptions. In this example, the NPE occurs when accessing the scan's memo here: https://github.com/cockroachdb/cockroach/blob/c7ecbe29fe222b1e1b8b5106238e160e06b3ce9b/pkg/sql/opt/xform/join_funcs.go#L402 Specifically, the logicalPropsBuilder's statisticBuilder is accessed which has a nil *opt.Metadata.

This does not reproduce for UDFs that are never inlined because each statement's expression is copied and re-optimized during evaluation of the UDF. This rebuilds the expressions with references to a non-detached memo.

Statements without UDFs are not affected by this, because when they are rebuilt during EXECUTE in AssignPlaceholders, we traverse the entire expression tree via CopyAndReplaceDefault which rebuilds each expression with references to a non-detached memo. UDFs are the only expressions vulnerable to this problem because they contain relational expressions within their "private" field, instead of as a child of the expression, so those expressions are not traversed during CopyAndReplaceDefault.

I see a few potential action items:

  1. Remove unnecessary use of RelExpr.Memo() where it is easy to do so. This is easy in some places (like the line linked above) because a reference to the correct memo is readily available. This will reduce the chance of triggering these NPEs, but it will not solve the problem entirely as long as some usage of RelExpr.Memo() remains.
  2. Remove the RelExpr.Memo() method and all references to a memo within expressions and expression groups. I spent some time reworking code to see if this was possible, and I believe it is, with some elbow grease. A few things become a bit less natural (e.g., we'll have to remove the String() method of opt.Expr), but the majority of changes are plumbing a *memo.Memo or *opt.Metadata around.
  3. Include a UDF body's relational expression as children in the UDFCallExpr so that they can be rebuilt during AssignPlaceholders with the correct references to the non-detached memo. @DrewKimball Are there any reasons you can think of why this would be a bad idea—I remember trying this and running into problems, though I don't remember specifics—maybe you tried it as well?
  4. Find some other way to correct the memo reference for expressions within a UDF during AssignPlaceholders. We could add a special optgen case, for example, though I'm not keen on breaking the mold for a single type of expression unless there is a strong reason to.
DrewKimball commented 1 year ago

Include a UDF body's relational expression as children in the UDFCallExpr so that they can be rebuilt during AssignPlaceholders with the correct references to the non-detached memo. @DrewKimball Are there any reasons you can think of why this would be a bad idea—I remember trying this and running into problems, though I don't remember specifics—maybe you tried it as well?

Moving the body out of the UDF private would require some optgen changes, since it doesn't have any conception of a slice of RelExpr. I did play around with it before, and it wasn't trivial, though I don't remember details. We also need the ability to have a UDFCallExpr without its body being built yet for recursive routines.

I think we should probably be copying the UDF body entirely when it gets inlined. Maybe we could keep some reference counter to avoid the copy in the happy case, but copying should probably be the default. I wonder if there are other potential bugs here beyond the issue with Memo().

mgartner commented 1 year ago

We also need the ability to have a UDFCallExpr without its body being built yet for recursive routines.

Sounds like it'll be tricky to make sure we only copy the body expressions once even when there are multiple UDFCallExprs in the tree.

I wonder if there are other potential bugs here beyond the issue with Memo().

Good point. I wouldn't be surprised to find other dragons lurking here.

I'll see if I can come up with a reasonable way to copy the body expressions.

yuzefovich commented 6 months ago

A simple reproduction from the report in #124004:

CREATE TABLE foo();
PREPARE p AS SELECT obj_description($1::REGCLASS::OID, 'pg_class');
EXECUTE p('foo');