Open cockroach-teamcity opened 5 days ago
This is right at the threshold:
CPU not evenly balanced after timeout: outside bounds mean=77.8 tolerance=20.0% (±15.6) bounds=[62.3, 93.4]
| below = []
| within = [s1: 76 (-1.5%), s2: 75 (-2.6%), s3: 74 (-4.1%), s4: 75 (-3.2%), s5: 71 (-8.7%)]
| above = [s6: 93 (+20.0%)]
I'll take a look, we should have a CPU profile somewhere.
The replica CPU was being controlled as expected:
But the actual CPU was not:
When looking at a CPU profile from n6
it makes sense why the replica CPU could be within bounds while the process CPU is not quite:
Presumably, the SQL load here is not balanced. Unfortunately, the other nodes didn't have CPU profiles available to diff against.
I don't see any value in investigating it further, except to motivate using a different metric for the test.
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.
roachtest.rebalance/by-load/replicas failed with artifacts on release-24.1 @ ed52acc6329e0dfa20e7e8a13dc47e959a65548c:
Parameters:
arch=amd64
cloud=gce
coverageBuild=false
cpu=4
encrypted=false
fs=ext4
localSSD=true
runtimeAssertionsBuild=true
ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
/cc @cockroachdb/kv-triageThis test on roachdash | Improve this report!
Jira issue: CRDB-44730