cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.16k stars 3.82k forks source link

release-24.2: roachtest: fix rebalance/by-load/*/mixed-version shared process tests #135947

Open blathers-crl[bot] opened 12 hours ago

blathers-crl[bot] commented 12 hours ago

Backport 2/2 commits from #131787 on behalf of @kvoli.

/cc @cockroachdb/release


In https://github.com/cockroachdb/cockroach/pull/129117, rebalance/by-load/*/mixed-version roachtest had shared-process multi-tenancy introduced, which would occasionally cause these tests to fail erroneously.

The cause of all the failures was identical, CPU utilization of some nodes which couldn't have been possible, > 100%, e.g.,

CPU not evenly balanced after timeout: outside bounds mean=102.5 tolerance=20.0% (±20.5) bounds=[82.0, 123.0]
below  = [s3: 81 (-20.7%), s5: 65 (-36.5%)]
within = [s2: 116 (+14.0%), s4: 92 (-9.7%), s6: 88 (-13.2%)]
above  = [s1: 170 (+66.1%)]

As the query would aggregate every tenant's timeseries data on a given node, instead of only the system tenant.

Update the timeseries utility used to query the CPU to also take in a TenantID parameter, which is then used to query only the system tenant.

Fixes: https://github.com/cockroachdb/cockroach/issues/129962 Fixes: https://github.com/cockroachdb/cockroach/issues/131274 Fixes: https://github.com/cockroachdb/cockroach/issues/129464 Release note: None


Release justification: Test only.

blathers-crl[bot] commented 12 hours ago

Thanks for opening a backport.

Please check the backport criteria before merging:

If your backport adds new functionality, please ensure that the following additional criteria are satisfied: - [ ] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way. - [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting). - [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added). - [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules. - [ ] Your backport must be accompanied by a post to the appropriate Slack channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this backport.

cockroach-teamcity commented 12 hours ago

This change is Reviewable