Open cockroach-teamcity opened 1 month ago
I'm removing the release blocker - it appears to be an issue where we have two back-to-back failures on epoch leases that aren't supported. Specifically:
09:27:00 failover.go:293: chaos iteration 5
09:28:12 failover.go:343: failing n8 (blackhole-recv)
09:28:12 failover.go:343: failing n9 (deadlock)
The deadlock doesn't go through because the blackhole has left the cluster in a bad state with epoch leases. The problem is that a blackhole with epoch lease doesn't always return availability so the attempt to induce the deadlock fails.
This is a test problem where we need to disallow this combination.
Assigning myself and setting P3 as it isn't a real issue but is also hard to fix without either crippling the test for epoch leases (to only have a single failure) or manually figuring out the combinations that can't be done together in a metamorphic-like test.
manually figuring out the combinations that can't be done together
This sounds promising. Seems like this issue has come up another time; @andrewbaptist do you think it'll help to at least list down the incompatible combinations? Even if we don't address the issue by automatically selecting from just the compatible operations, having a list would make for quick triage.
We have marked this test failure issue as stale because it has been inactive for 1 month. If this failure is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the test failure queue tidy.
roachtest.failover/chaos/read-only failed with artifacts on release-24.1.2-rc @ 7e81be6de75205c3d08b0d8dcc6ca188306abc27:
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=2
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=false
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
Same failure on other branches
- #126542 roachtest: failover/chaos/read-only failed [A-testing C-bug C-test-failure O-roachtest O-robot P-3 T-kv branch-release-23.1.24-rc]
/cc @cockroachdb/kv-triage
This test on roachdash | Improve this report!
Jira issue: CRDB-40187