cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

roachtest: import/tpcc/warehouses=4000/geo failed #124011

Closed cockroach-teamcity closed 5 months ago

cockroach-teamcity commented 6 months ago

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-23.2 @ e256c72890581641e888e486dfcaa0d9f661f49d:

(monitor.go:153).Wait: monitor failure: pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10s (given timeout 10s): internal error while retrieving user account: get auth info error: interrupted during singleflight acquire-lease:4: context deadline exceeded
test artifacts and logs in: /artifacts/import/tpcc/warehouses=4000/geo/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #123640 roachtest: import/tpcc/warehouses=4000/geo failed [C-test-failure O-roachtest O-robot T-sql-queries branch-release-24.1.0-rc]

This test on roachdash | Improve this report!

Jira issue: CRDB-38661

cockroach-teamcity commented 5 months ago

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-23.2 @ 63500cca942227225a77aa69a60f77f978100b82:

(cluster.go:2344).Run: full command output in run_181400.142461963_n1_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
(monitor.go:153).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/import/tpcc/warehouses=4000/geo/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #123640 roachtest: import/tpcc/warehouses=4000/geo failed [C-test-failure O-roachtest O-robot T-sql-queries branch-release-24.1.0-rc]

This test on roachdash | Improve this report!

yuzefovich commented 5 months ago

In the first instance we see connectivity issues between the nodes, for example on node 5:

W240512 17:59:57.893890 886 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n5,rnode=2,raddr=‹10.154.0.69:29000›,class=default,rpc] 311  latency jump (prev avg 704.00ms, current 5624.71ms)

In the second instance we see similar symptoms. Closing as infra flake.