cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.96k stars 3.79k forks source link

roachtest: multitenant/distsql/instances=20/bundle=on/timeout=1 failed #126313

Closed cockroach-teamcity closed 2 months ago

cockroach-teamcity commented 3 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ d2a5af43be05ade40531e6e0ef23c5880faa4494:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 35.231.108.147:29002: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

Jira issue: CRDB-39846

cockroach-teamcity commented 3 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 047a7ed79756eef53b8b9ab4c9dd9c5a463496c9:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1247
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 35.185.15.229:29002: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ d13253527955eaa2da09394b8a2729627ab25c48:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1247
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 35.237.37.137:29002: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ d13253527955eaa2da09394b8a2729627ab25c48:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1247
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.138.149.104:29002: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 9e1cd533828b7887a48db8635f705669287cf2d6:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.74.235.10:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 8f72a01e81d4751c6615a9a864b8e32076565b09:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.44.51.248:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 295d09a88895a69e5cc9149fb8165acf78e39e61:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.23.181.197:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ c557fb59f6aec659d364e9002fc083c59c6392b6:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.23.127.167:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 485975b3a824c68c07340e6a336c7864c00d3c6d:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 35.185.75.152:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 485975b3a824c68c07340e6a336c7864c00d3c6d:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.139.201.152:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 6b22014dec8c5e8e6006d710ed05bab5b01b4ae2:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 35.237.91.104:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 0b918d1dc3a9ce1f04975202be3b04ec375a816e:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.23.102.168:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ b78574b4b75b6f9ab8841d96f350697030f5e4df:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 34.74.151.98:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 3afb6935d0ef3de7c0d44cfb3cd54f312752c186:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 104.196.149.188:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #126979 roachtest: multitenant/distsql/instances=20/bundle=on/timeout=1 failed [C-test-failure O-roachtest O-robot T-sql-queries branch-release-24.2 release-blocker]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ dc3ebba01015d2c7275aa27af2a54fe88b87e901:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1248
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 35.196.68.247:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #126979 roachtest: multitenant/distsql/instances=20/bundle=on/timeout=1 failed [C-test-failure O-roachtest O-robot T-sql-queries branch-release-24.2 release-blocker]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 67aa2613446c3c6b53bad21f01286c98290cc0a3:

(test_runner.go:1301).runTest: test timed out (3h0m0s)
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #126979 roachtest: multitenant/distsql/instances=20/bundle=on/timeout=1 failed [C-test-failure O-roachtest O-robot T-sql-queries branch-release-24.2 release-blocker]

This test on roachdash | Improve this report!

andreimatei commented 2 months ago

I looked briefly at this because it happened to be the first timing out test that Side-Eye automatically captured a snapshot for.

The test seems to poll for the physical planner to get to a point where it uses all the nodes for some queries, and this condition seems to never be satisfied. The test keeps printing:

2024/07/15 09:08:32 multitenant_distsql.go:161: Only 7 nodes present: (1,4,11,15,16,18,20)

First of all, I think the test intended to fail in a better way in such cases, but I think the respective code is broken (or, at least, it doesn't make sense to me). Around here the test seems to want to implement its own 180s timeout. But notice that attempts is only decremented if err != nil. Shouldn't it always be incremented? There's a similar dubious pattern further down where a different attempts is only decremented on an error.

Now for why the test really fails, I've got no idea. But Side-Eye shows that all the INSERT INTO t SELECT $1,generate_series(1,100)+$2*100,repeat('asdfasdf',1024) queries running at the time of the timeout were blocked on tenantcostclient.(*limiter).Wait() (see e.g.). Could admission control be blocking the writing of the data that the test is depending on?

cc @mgartner @yuzefovich

cockroach-teamcity commented 2 months ago

roachtest.multitenant/distsql/instances=20/bundle=on/timeout=1 failed with artifacts on master @ 73ecc8b494c4e7f757bb35d68b3587a51af93b02:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:193
                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_distsql.go:47
                                main/pkg/cmd/roachtest/test_runner.go:1272
                                src/runtime/asm_amd64.s:1695
    Error:          Error message not equal:
                    expected: "pq: query execution canceled due to statement timeout"
                    actual  : "dial tcp 104.155.166.235:29000: connect: connection refused"
    Test:           multitenant/distsql/instances=20/bundle=on/timeout=1
(require.go:177).EqualError: FailNow called
test artifacts and logs in: /artifacts/multitenant/distsql/instances=20/bundle=on/timeout=1/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

mgartner commented 2 months ago

Let's assume #127219 is the fix and close this issue.

yuzefovich commented 2 months ago

We do have #121260 to track improving the test overall - thanks @andreimatei for your insight.