cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.56k stars 3.7k forks source link

pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test: TestLogic_mixed_version_sequence_per_node_cache failed [node unavailable in KV NodeLiveness even though it started up] #125820

Open github-actions[bot] opened 4 weeks ago

github-actions[bot] commented 4 weeks ago

pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed on master @ 5e72cfccd672c1a66e56df12fe1191219c2df7d1:

=== RUN   TestLogic_mixed_version_sequence_per_node_cache
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestLogic_mixed_version_sequence_per_node_cache3155382550
    test_log_scope.go:81: use -show-logs to present logs inline
[01:35:49] --- progress: /var/lib/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache: 6 statements
[01:36:36] --- done: /var/lib/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache with config cockroach-go-testserver-23.2: 9 tests, 0 failures
    logic.go:4135: 
        /var/lib/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: error while processing
    logic.go:4135: 
        /var/lib/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: 
        expected success, but found
        (XXUUU) nodes n{3} required, but unavailable
        cluster.go:94: in UntilClusterStable()
cockroach logs captured in: /var/lib/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/testlogs/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test/shard_5_of_7/test.outputs/cockroach-logs913842376
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestLogic_mixed_version_sequence_per_node_cache3155382550
--- FAIL: TestLogic_mixed_version_sequence_per_node_cache (67.20s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-39620

rafiss commented 4 weeks ago

The issue from https://github.com/cockroachdb/cockroach/issues/120521 is still occurring. https://github.com/cockroachdb/cockroach/pull/124288 was an attempted fix.

In the logs from n1, we have:

1762:I240618 01:36:36.576016 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 650  [], retrying, unavailable n{3}
1764:I240618 01:36:36.576705 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 652  [], retrying, unavailable n{3}
1766:I240618 01:36:36.580026 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 654  [], retrying, unavailable n{3}
1768:I240618 01:36:36.582334 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 656  [], retrying, unavailable n{3}
1770:I240618 01:36:36.583279 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 658  [], retrying, unavailable n{3}
1772:I240618 01:36:36.583920 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 660  [], retrying, unavailable n{3}
1774:I240618 01:36:36.584520 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 662  [], retrying, unavailable n{3}
1776:I240618 01:36:36.585441 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 664  [], retrying, unavailable n{3}
1778:I240618 01:36:36.586555 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 666  [], retrying, unavailable n{3}
1780:I240618 01:36:36.587239 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 668  [], retrying, unavailable n{3}
1782:I240618 01:36:36.587954 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 670  [], retrying, unavailable n{3}
1783:W240618 01:36:36.588079 1240 upgrade/upgrademanager/manager.go:308 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 671  error encountered during version upgrade: nodes n{3} required, but unavailable

In the n3 logs, the shutdown occurred at 01:36:28.

I240618 01:36:28.136423 1 1@cli/start.go:1068 ⋮ [T1,Vsystem,n3] 709  server drained and shutdown completed

And the restart completed around 01:36:36, which is when we see those logs from n1 retrying.

I240618 01:36:36.566993 35 1@cli/start.go:1280 ⋮ [T1,Vsystem,n3] 98 +CockroachDB node starting at 2024-06-18 01:36:36.566901224 +0000 UTC m=+8.413590569 (took 8.3s)

So it seems like there is still a race here. @kvoli do you think we need a sleep (with or without exponential backoff) in that retry loop?

rafiss commented 3 weeks ago

@kvoli this is happening once or twice a day - any thoughts on the above?

So it seems like there is still a race here. @kvoli do you think we need a sleep (with or without exponential backoff) in that retry loop?

kvoli commented 3 weeks ago

Apologies for the delay. Introducing a backoff in the retry makes sense. I can open a patch.

cockroach-teamcity commented 2 weeks ago

pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed with artifacts on master @ b4402507fdebe5682f77e348331a308d378c91e1:

=== RUN   TestLogic_mixed_version_sequence_per_node_cache
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache1512689732
    test_log_scope.go:81: use -show-logs to present logs inline
[02:53:21] --- progress: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache: 6 statements
[02:54:00] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache with config cockroach-go-testserver-23.2: 9 tests, 0 failures
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: error while processing
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: 
        expected success, but found
        (XXUUU) nodes n{3} required, but unavailable
        cluster.go:94: in UntilClusterStable()
cockroach logs captured in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/cockroach-logs3314451839
    panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache1512689732
--- FAIL: TestLogic_mixed_version_sequence_per_node_cache (49.79s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 week ago

pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed with artifacts on master @ 64a3769a9b6daa3d02bf21a6a848f3a4db04b97a:

=== RUN   TestLogic_mixed_version_sequence_per_node_cache
I240704 08:41:25.646661 1 (gostd) rand.go:250  [-] 1  random seed: 5220135287694048209
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache2834093134
    test_log_scope.go:81: use -show-logs to present logs inline
[08:41:41] --- progress: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache: 6 statements
[08:42:31] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache with config cockroach-go-testserver-23.2: 9 tests, 0 failures
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: error while processing
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: 
        expected success, but found
        (XXUUU) nodes n{1} required, but unavailable
        cluster.go:94: in UntilClusterStable()
    panic.go:626: -- test log scope end --
--- FAIL: TestLogic_mixed_version_sequence_per_node_cache (65.99s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 week ago

pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed with artifacts on master @ 46a5cb142d7f9c19c190d63dc7c8f91195e4ef3c:

=== RUN   TestLogic_mixed_version_sequence_per_node_cache
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache222312241
    test_log_scope.go:81: use -show-logs to present logs inline
[17:02:30] --- progress: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache: 6 statements
[17:03:24] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache with config cockroach-go-testserver-23.2: 9 tests, 0 failures
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: error while processing
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: 
        expected success, but found
        (XXUUU) nodes n{1} required, but unavailable
        cluster.go:94: in UntilClusterStable()
cockroach logs captured in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/cockroach-logs1583414253
    panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache222312241
--- FAIL: TestLogic_mixed_version_sequence_per_node_cache (76.35s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

This test on roachdash | Improve this report!