Open github-actions[bot] opened 4 weeks ago
The issue from https://github.com/cockroachdb/cockroach/issues/120521 is still occurring. https://github.com/cockroachdb/cockroach/pull/124288 was an attempted fix.
In the logs from n1, we have:
1762:I240618 01:36:36.576016 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 650 [], retrying, unavailable n{3}
1764:I240618 01:36:36.576705 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 652 [], retrying, unavailable n{3}
1766:I240618 01:36:36.580026 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 654 [], retrying, unavailable n{3}
1768:I240618 01:36:36.582334 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 656 [], retrying, unavailable n{3}
1770:I240618 01:36:36.583279 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 658 [], retrying, unavailable n{3}
1772:I240618 01:36:36.583920 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 660 [], retrying, unavailable n{3}
1774:I240618 01:36:36.584520 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 662 [], retrying, unavailable n{3}
1776:I240618 01:36:36.585441 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 664 [], retrying, unavailable n{3}
1778:I240618 01:36:36.586555 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 666 [], retrying, unavailable n{3}
1780:I240618 01:36:36.587239 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 668 [], retrying, unavailable n{3}
1782:I240618 01:36:36.587954 1240 upgrade/upgradecluster/cluster.go:89 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 670 [], retrying, unavailable n{3}
1783:W240618 01:36:36.588079 1240 upgrade/upgrademanager/manager.go:308 ⋮ [T1,Vsystem,n1,peer=‹127.0.0.1:38462›,client=127.0.0.1:38462,hostnossl,user=root,migration-mgr] 671 error encountered during version upgrade: nodes n{3} required, but unavailable
In the n3 logs, the shutdown occurred at 01:36:28.
I240618 01:36:28.136423 1 1@cli/start.go:1068 ⋮ [T1,Vsystem,n3] 709 server drained and shutdown completed
And the restart completed around 01:36:36, which is when we see those logs from n1 retrying.
I240618 01:36:36.566993 35 1@cli/start.go:1280 ⋮ [T1,Vsystem,n3] 98 +CockroachDB node starting at 2024-06-18 01:36:36.566901224 +0000 UTC m=+8.413590569 (took 8.3s)
So it seems like there is still a race here. @kvoli do you think we need a sleep (with or without exponential backoff) in that retry loop?
@kvoli this is happening once or twice a day - any thoughts on the above?
So it seems like there is still a race here. @kvoli do you think we need a sleep (with or without exponential backoff) in that retry loop?
Apologies for the delay. Introducing a backoff in the retry makes sense. I can open a patch.
pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed with artifacts on master @ b4402507fdebe5682f77e348331a308d378c91e1:
=== RUN TestLogic_mixed_version_sequence_per_node_cache
test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache1512689732
test_log_scope.go:81: use -show-logs to present logs inline
[02:53:21] --- progress: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache: 6 statements
[02:54:00] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache with config cockroach-go-testserver-23.2: 9 tests, 0 failures
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: error while processing
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12495/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50:
expected success, but found
(XXUUU) nodes n{3} required, but unavailable
cluster.go:94: in UntilClusterStable()
cockroach logs captured in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/cockroach-logs3314451839
panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache1512689732
--- FAIL: TestLogic_mixed_version_sequence_per_node_cache (49.79s)
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed with artifacts on master @ 64a3769a9b6daa3d02bf21a6a848f3a4db04b97a:
=== RUN TestLogic_mixed_version_sequence_per_node_cache
I240704 08:41:25.646661 1 (gostd) rand.go:250 [-] 1 random seed: 5220135287694048209
test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache2834093134
test_log_scope.go:81: use -show-logs to present logs inline
[08:41:41] --- progress: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache: 6 statements
[08:42:31] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache with config cockroach-go-testserver-23.2: 9 tests, 0 failures
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: error while processing
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/12682/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50:
expected success, but found
(XXUUU) nodes n{1} required, but unavailable
cluster.go:94: in UntilClusterStable()
panic.go:626: -- test log scope end --
--- FAIL: TestLogic_mixed_version_sequence_per_node_cache (65.99s)
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed with artifacts on master @ 46a5cb142d7f9c19c190d63dc7c8f91195e4ef3c:
=== RUN TestLogic_mixed_version_sequence_per_node_cache
test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache222312241
test_log_scope.go:81: use -show-logs to present logs inline
[17:02:30] --- progress: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache: 6 statements
[17:03:24] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache with config cockroach-go-testserver-23.2: 9 tests, 0 failures
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50: error while processing
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/7313/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test_/cockroach-go-testserver-23_2_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/mixed_version_sequence_per_node_cache:50:
expected success, but found
(XXUUU) nodes n{1} required, but unavailable
cluster.go:94: in UntilClusterStable()
cockroach logs captured in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/cockroach-logs1583414253
panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/2ff0896c414e8ce4b6bbffec5882885c/logTestLogic_mixed_version_sequence_per_node_cache222312241
--- FAIL: TestLogic_mixed_version_sequence_per_node_cache (76.35s)
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
pkg/sql/logictest/tests/cockroach-go-testserver-23.2/cockroach-go-testserver-23_2_test.TestLogic_mixed_version_sequence_per_node_cache failed on master @ 5e72cfccd672c1a66e56df12fe1191219c2df7d1:
Parameters:
attempt=1
run=1
shard=5
Help
See also: How To Investigate a Go Test Failure (internal)
/cc @cockroachdb/sql-foundationsThis test on roachdash | Improve this report!
Jira issue: CRDB-39620