cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.06k stars 3.8k forks source link

roachtest: perturbation/metamorphic/decommission failed #133001

Open cockroach-teamcity opened 6 days ago

cockroach-teamcity commented 6 days ago

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 472ea07a5232c98536293d13bb46cca59f9f2cd0:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:851
                                main/pkg/cmd/roachtest/test_runner.go:1281
                                src/runtime/asm_amd64.s:1695
    Error:          Should be true
    Test:           perturbation/metamorphic/decommission
    Messages:       FAILURE: follower-read  : Increase 12.0712 > 5.0000 BASE: 7.196104ms SCORE: 86.865654ms

                    FAILURE: read           : Increase 12.0245 > 5.0000 BASE: 7.287448ms SCORE: 87.62822ms

                    FAILURE: write          : Increase 22.6591 > 5.0000 BASE: 6.779942ms SCORE: 153.627458ms
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-43399

arulajmani commented 4 days ago

Given we introduced this test this release and this specific variant has had a few failures in the last month (last example: https://github.com/cockroachdb/cockroach/issues/132096, which looks somewhat similar to this to the untrained eye), I'll downgrade this to a GA-blocker for now.

andrewbaptist commented 4 days ago

I spent a little bit looking at this failure, and it is similar to #131822. One of the nodes (n25) "stalled" for about 1.5 seconds and resulted in slow traces. As an example, here was a slow request (2024-10-19T15_58_29Z-1013458893388054543.zip). While neither of these failures was in the raft tracing code, its possible that the amount of logging from raft tracing is causing the system to stall.

However we haven't watched this enough with and without this failure to say conclusively. I'm planning to pair with @arulajmani tomorrow to look more at this.

cockroach-teamcity commented 3 days ago

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 1e5b3c212b45419c960038718c48a5dd75a111a0:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:857
                                main/pkg/cmd/roachtest/test_runner.go:1281
                                src/runtime/asm_amd64.s:1695
    Error:          Should be true
    Test:           perturbation/metamorphic/decommission
    Messages:       FAILURE: follower-read  : Increase 208970.1360 > 5.0000 BASE: 17.227342ms SCORE: 1h0m0s

                    FAILURE: read           : Increase 229075.3799 > 5.0000 BASE: 15.715351ms SCORE: 1h0m0s

                    FAILURE: write          : Increase 294035.8023 > 5.0000 BASE: 12.243407ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 days ago

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 787f2e3fe5f73b33fcd65485908cbb71e0991222:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:857
                                main/pkg/cmd/roachtest/test_runner.go:1281
                                src/runtime/asm_amd64.s:1695
    Error:          Should be true
    Test:           perturbation/metamorphic/decommission
    Messages:       FAILURE: follower-read  : Increase 533092.1388 > 5.0000 BASE: 6.753054ms SCORE: 1h0m0s

                    FAILURE: read           : Increase 508863.7708 > 5.0000 BASE: 7.074585ms SCORE: 1h0m0s

                    FAILURE: write          : Increase 589030.8708 > 5.0000 BASE: 6.111734ms SCORE: 1h0m0s

                    FAILURE: follower-read  : Increase 533092.1388 > 5.0000 BASE: 6.753054ms SCORE: 1h0m0s

                    FAILURE: read           : Increase 508863.7708 > 5.0000 BASE: 7.074585ms SCORE: 1h0m0s

                    FAILURE: write          : Increase 589030.8708 > 5.0000 BASE: 6.111734ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 day ago

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 9c1d89e3adb8c6532459cc3e616288db06f966d9:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:857
                                main/pkg/cmd/roachtest/test_runner.go:1281
                                src/runtime/asm_amd64.s:1695
    Error:          Should be true
    Test:           perturbation/metamorphic/decommission
    Messages:       FAILURE: follower-read  : Increase 197969.9284 > 5.0000 BASE: 18.18458ms SCORE: 1h0m0s

                    FAILURE: read           : Increase 183843.6843 > 5.0000 BASE: 19.581853ms SCORE: 1h0m0s

                    FAILURE: write          : Increase 226032.0908 > 5.0000 BASE: 15.926942ms SCORE: 1h0m0s

                    FAILURE: follower-read  : Increase 197969.9284 > 5.0000 BASE: 18.18458ms SCORE: 1h0m0s

                    FAILURE: read           : Increase 183843.6843 > 5.0000 BASE: 19.581853ms SCORE: 1h0m0s

                    FAILURE: write          : Increase 226032.0908 > 5.0000 BASE: 15.926942ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 12 hours ago

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 82b1fda15c4616713b278c447d24b0ab5416e511:

(assertions.go:363).Fail: 
    Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/admission_control_latency.go:858
                                main/pkg/cmd/roachtest/test_runner.go:1287
                                src/runtime/asm_amd64.s:1695
    Error:          Should be true
    Test:           perturbation/metamorphic/decommission
    Messages:       FAILURE: follower-read  : Increase 547747.1161 > 5.0000 BASE: 6.572376ms SCORE: 1h0m0s

                    FAILURE: read           : Increase 520145.7506 > 5.0000 BASE: 6.921137ms SCORE: 1h0m0s

                    FAILURE: write          : Increase 376444.4487 > 5.0000 BASE: 9.563164ms SCORE: 1h0m0s

                    FAILURE: follower-read  : Increase 547747.1161 > 5.0000 BASE: 6.572376ms SCORE: 1h0m0s

                    FAILURE: read           : Increase 520145.7506 > 5.0000 BASE: 6.921137ms SCORE: 1h0m0s

                    FAILURE: write          : Increase 376444.4487 > 5.0000 BASE: 9.563164ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!