cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.05k stars 3.8k forks source link

roachtest: rebalance/by-load/leases/mixed-version failed #123656

Closed cockroach-teamcity closed 5 months ago

cockroach-teamcity commented 5 months ago

roachtest.rebalance/by-load/leases/mixed-version failed with artifacts on master @ 379d332c9a4ce49cda4a3565a852e7ddb850ffb5:

(mixedversion.go:592).Run: mixed-version test failure while running step 1 (start cluster at version "v22.1.20"): COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/rebalance/by-load/leases/mixed-version/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-38453

blathers-crl[bot] commented 5 months ago

cc @cockroachdb/test-eng

andrewbaptist commented 5 months ago

Test Logs:

Wraps: (4) Node 1. Command with error:
  | ```
  | ./cockroach.sh
  | ```
  | Job for cockroach-system.service failed because the control process exited with error code.
  | See "systemctl status cockroach-system.service" and "journalctl -xeu cockroach-system.service" for details.

We don't collect logs either so it makes me think this is an infrastructre problem. I'm tagging with X-infra, but it would be best if someone from test-eng can validate.

run_062524.368795483_n1-4_du-c-mntdata1-exclud: 06:25:24 cluster.go:2372: > du -c /mnt/data1 --exclude lost+found >> logs/diskusage.txt
teamcity-15127361-1714974475-03-n4cpu4:[1 2 3 4]: du -c /mnt/data1 --exclude ...
   2:   <err> COMMAND_PROBLEM: exit status 1
    bash: line 1: logs/diskusage.txt: No such file or directory

run_062524.368795483_n1-4_du-c-mntdata1-exclud: 06:25:24 cluster.go:2382: > result: COMMAND_PROBLEM: exit status 1
srosenberg commented 5 months ago

We don't collect logs either so it makes me think this is an infrastructre problem. I'm tagging with X-infra, but it would be best if someone from test-eng can validate.

There was a logged hint of the root cause in cockroach.stderr.log [1],

ERROR: unknown channel name: "KV_DISTRIBUTION"
Failed running "start"

This was a recent regression introduced by [2], now fixed by [3]. In [2], cockroachdb-logging.yaml is unconditionally used with a specified cockroach binary. The regression occurs in the context of mixed-version tests, since KV_DISTRIBUTION log channel doesn't exist in 22.1. In [3], the logging config. is behind a CLI option (enable-fluent-sink), which is disabled by default.

[1] https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_RoachtestNightlyGceBazel/15127361:id/rebalance/by-load/leases/mixed-version/run_1/artifacts.zip!/logs/1.unredacted/cockroach.stderr.log [2] https://github.com/cockroachdb/cockroach/pull/123227 [3] https://github.com/cockroachdb/cockroach/pull/123603

srosenberg commented 5 months ago

Should be resolved by [3], closing.