Open cockroach-teamcity opened 6 days ago
Given we introduced this test this release and this specific variant has had a few failures in the last month (last example: https://github.com/cockroachdb/cockroach/issues/132096, which looks somewhat similar to this to the untrained eye), I'll downgrade this to a GA-blocker for now.
I spent a little bit looking at this failure, and it is similar to #131822. One of the nodes (n25) "stalled" for about 1.5 seconds and resulted in slow traces. As an example, here was a slow request (2024-10-19T15_58_29Z-1013458893388054543.zip). While neither of these failures was in the raft tracing code, its possible that the amount of logging from raft tracing is causing the system to stall.
However we haven't watched this enough with and without this failure to say conclusively. I'm planning to pair with @arulajmani tomorrow to look more at this.
roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 1e5b3c212b45419c960038718c48a5dd75a111a0:
(assertions.go:363).Fail:
Error Trace: github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:857
main/pkg/cmd/roachtest/test_runner.go:1281
src/runtime/asm_amd64.s:1695
Error: Should be true
Test: perturbation/metamorphic/decommission
Messages: FAILURE: follower-read : Increase 208970.1360 > 5.0000 BASE: 17.227342ms SCORE: 1h0m0s
FAILURE: read : Increase 229075.3799 > 5.0000 BASE: 15.715351ms SCORE: 1h0m0s
FAILURE: write : Increase 294035.8023 > 5.0000 BASE: 12.243407ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=8
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=false
ROACHTEST_ssd=1
See: roachtest README
See: How To Investigate (internal)
See: Grafana
roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 787f2e3fe5f73b33fcd65485908cbb71e0991222:
(assertions.go:363).Fail:
Error Trace: github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:857
main/pkg/cmd/roachtest/test_runner.go:1281
src/runtime/asm_amd64.s:1695
Error: Should be true
Test: perturbation/metamorphic/decommission
Messages: FAILURE: follower-read : Increase 533092.1388 > 5.0000 BASE: 6.753054ms SCORE: 1h0m0s
FAILURE: read : Increase 508863.7708 > 5.0000 BASE: 7.074585ms SCORE: 1h0m0s
FAILURE: write : Increase 589030.8708 > 5.0000 BASE: 6.111734ms SCORE: 1h0m0s
FAILURE: follower-read : Increase 533092.1388 > 5.0000 BASE: 6.753054ms SCORE: 1h0m0s
FAILURE: read : Increase 508863.7708 > 5.0000 BASE: 7.074585ms SCORE: 1h0m0s
FAILURE: write : Increase 589030.8708 > 5.0000 BASE: 6.111734ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=16
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=false
ROACHTEST_ssd=1
See: roachtest README
See: How To Investigate (internal)
See: Grafana
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.
roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 9c1d89e3adb8c6532459cc3e616288db06f966d9:
(assertions.go:363).Fail:
Error Trace: github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:857
main/pkg/cmd/roachtest/test_runner.go:1281
src/runtime/asm_amd64.s:1695
Error: Should be true
Test: perturbation/metamorphic/decommission
Messages: FAILURE: follower-read : Increase 197969.9284 > 5.0000 BASE: 18.18458ms SCORE: 1h0m0s
FAILURE: read : Increase 183843.6843 > 5.0000 BASE: 19.581853ms SCORE: 1h0m0s
FAILURE: write : Increase 226032.0908 > 5.0000 BASE: 15.926942ms SCORE: 1h0m0s
FAILURE: follower-read : Increase 197969.9284 > 5.0000 BASE: 18.18458ms SCORE: 1h0m0s
FAILURE: read : Increase 183843.6843 > 5.0000 BASE: 19.581853ms SCORE: 1h0m0s
FAILURE: write : Increase 226032.0908 > 5.0000 BASE: 15.926942ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=true
ROACHTEST_ssd=1
See: roachtest README
See: How To Investigate (internal)
See: Grafana
roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 82b1fda15c4616713b278c447d24b0ab5416e511:
(assertions.go:363).Fail:
Error Trace: github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:858
main/pkg/cmd/roachtest/test_runner.go:1287
src/runtime/asm_amd64.s:1695
Error: Should be true
Test: perturbation/metamorphic/decommission
Messages: FAILURE: follower-read : Increase 547747.1161 > 5.0000 BASE: 6.572376ms SCORE: 1h0m0s
FAILURE: read : Increase 520145.7506 > 5.0000 BASE: 6.921137ms SCORE: 1h0m0s
FAILURE: write : Increase 376444.4487 > 5.0000 BASE: 9.563164ms SCORE: 1h0m0s
FAILURE: follower-read : Increase 547747.1161 > 5.0000 BASE: 6.572376ms SCORE: 1h0m0s
FAILURE: read : Increase 520145.7506 > 5.0000 BASE: 6.921137ms SCORE: 1h0m0s
FAILURE: write : Increase 376444.4487 > 5.0000 BASE: 9.563164ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=16
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=false
ROACHTEST_ssd=1
See: roachtest README
See: How To Investigate (internal)
See: Grafana
roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 472ea07a5232c98536293d13bb46cca59f9f2cd0:
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=16
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=false
ROACHTEST_ssd=2
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
/cc @cockroachdb/kv-triageThis test on roachdash | Improve this report!
Jira issue: CRDB-43399