cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.77k stars 3.76k forks source link

roachtest: cdc/workload/kv100/nodes=5/cpu=16/ranges=100k/server=processor/protocol=mux/format=json/sink=null failed #125125

Closed cockroach-teamcity closed 3 weeks ago

cockroach-teamcity commented 2 months ago

roachtest.cdc/workload/kv100/nodes=5/cpu=16/ranges=100k/server=processor/protocol=mux/format=json/sink=null failed with artifacts on master @ 4c06ddd826bd32f84adbdf2f78f90c7d0c2f2d60:

(monitor.go:154).Wait: monitor failure: full command output in run_104521.292405531_n7_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/cdc/workload/kv100/nodes=5/cpu=16/ranges=100k/server=processor/protocol=mux/format=json/sink=null/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-39261

rharding6373 commented 2 months ago

From the logs:

run_104521.292405531_n7_cockroach-workload-r: 10:45:21 cluster.go:2421: > ./cockroach workload run kv --seed -7890190094025155754 --histograms=perf/stats.json --concurrency 320 --duration 20m0s --write-seq R1000000 --read-percent 100  {pgurl:1-5}
I240605 10:45:22.548570 1 workload/cli/run.go:640  [-] 1  random seed: -7890190094025155754
I240605 10:45:22.548694 1 workload/cli/run.go:432  [-] 2  creating load generator...
I240605 10:45:22.789691 1 workload/cli/run.go:471  [-] 3  creating load generator... done (took 240.995001ms)
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
...
Number of reads that didn't return any results: 39.
Write sequence could be resumed by passing --write-seq=R1000000 to the next run.
Error: unexpected EOF
run_104521.292405531_n7_cockroach-workload-r: 10:58:22 cluster.go:2431: > result: COMMAND_PROBLEM: exit status 1

I don't know where the Error: unexpected EOF. The empty read results in the kv100 workload don't cause an error.

rharding6373 commented 2 months ago

Current theory is that this is due to cluster overload. Removing release blocker label.

rharding6373 commented 3 weeks ago

Artifacts are now expired. We're going to have to be satisfied with the cluster overload explanation for now.