cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.17k stars 3.82k forks source link

roachtest: import/mixed-versions failed #132814

Open cockroach-teamcity opened 1 month ago

cockroach-teamcity commented 1 month ago

roachtest.import/mixed-versions failed with artifacts on master @ 42f40f59cae3c0fd8842e194d6991c951ab4382f:

(mixedversion.go:732).Run: mixed-version test failure while running step 23 (restart node 1 with binary version v24.1.5): waiting for shared-process tenant on n1: pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10s (given timeout 10s): internal error while retrieving user account: get auth info error: interrupted during singleflight load-value:authinfo-roachprod-2-2: context deadline exceeded [owner=test-eng]
test artifacts and logs in: /artifacts/import/mixed-versions/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

- #131069 roachtest: import/mixed-versions failed [C-test-failure O-roachtest O-robot T-sql-queries branch-release-24.2 release-blocker]

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-43285

DarrylWong commented 1 month ago

Seems similar to the issue seen in #128768 and #127724 where:

  1. Node 1 is restarted by the MVT framework.
  2. Node 1 hits operation "get-user-session" timed out after 10s trying to serve a query to get the cluster version.

Unlike the above, I don't see any liveness issues on Node 1?

There's also #130384 which seems to have been fixed by setting WaitFor3XReplication, but seems like that issue was also running into liveness issues + occurred semi frequently. I can try adding the wait but I'm still trying to get this to repro so hard to tell if that's the fix.

DarrylWong commented 1 month ago

Can't get this to repro at all. I noticed that it fails before we even run the import user hook, so seems unrelated to the import test itself. Tried writing a makeshift mixed version test that just spams restarts + queries for cluster versions but no luck.

Not sure if @cockroachdb/kv would care to take a look. Seems like the conclusion last time was infra flake but maybe the lack of liveness issues indicates something different.

If not I'll remove release-blocker and add WaitFor3XReplication if we see it fail again. Unless someone else on test-eng has other ideas.

cockroach-teamcity commented 1 week ago

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.import/mixed-versions failed with artifacts on master @ 39e43b85ec3b02bc760df10fce1c19d09419d6f2:

(mixedversion.go:759).Run: mixed-version test failure while running step 12 (restart node 4 with binary version v23.2.14): waiting for shared-process tenant on n4: pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10s (given timeout 10s): internal error while retrieving user account: get auth info error: interrupted during singleflight load-value:authinfo-roachprod-2-2: context deadline exceeded [owner=test-eng]
test artifacts and logs in: /artifacts/import/mixed-versions/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 week ago

roachtest.import/mixed-versions failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:

(mixedversion.go:759).Run: unexpected node event: n1: cockroach process for system interface died (exit code 134)
test artifacts and logs in: /artifacts/import/mixed-versions/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 4 days ago

roachtest.import/mixed-versions failed with artifacts on master @ 9927a9a1f0827daa734d5eb718017cf260dfe676:

(mixedversion.go:759).Run: mixed-version test failure while running step 6 (run "import"): full command output in run_065330.313973074_n4_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/import/mixed-versions/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 4 days ago

roachtest.import/mixed-versions failed with artifacts on master @ 9927a9a1f0827daa734d5eb718017cf260dfe676:

(mixedversion.go:759).Run: mixed-version test failure while running step 8 (run "import"): full command output in run_065335.186022489_n3_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/import/mixed-versions/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 days ago

roachtest.import/mixed-versions failed with artifacts on master @ 8eeb7f2ae3b2cede564b46ca47e2353fd147c061:

(mixedversion.go:759).Run: mixed-version test failure while running step 11 (run "import"): full command output in run_064844.523056770_n3_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/import/mixed-versions/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 days ago

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.import/mixed-versions failed with artifacts on master @ 8eeb7f2ae3b2cede564b46ca47e2353fd147c061:

(mixedversion.go:759).Run: mixed-version test failure while running step 13 (run "import"): full command output in run_065851.197285822_n4_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/import/mixed-versions/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 days ago

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.import/mixed-versions failed with artifacts on master @ eb2d2e19eb29d2747d9e267bd0612a69d066adad:

(mixedversion.go:759).Run: mixed-version test failure while running step 6 (run "import"): full command output in run_065101.300038200_n2_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/import/mixed-versions/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 days ago

roachtest.import/mixed-versions failed with artifacts on master @ eb2d2e19eb29d2747d9e267bd0612a69d066adad:

(mixedversion.go:759).Run: mixed-version test failure while running step 7 (run "import"): full command output in run_065144.121480738_n3_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/import/mixed-versions/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!