canonical / juju-lint

Run checks against a juju model
GNU General Public License v3.0
0 stars 3 forks source link

Check for nova-compute vcpu-pin-set sanity #101

Open zxhdaze opened 4 months ago

zxhdaze commented 4 months ago

If nova-compute charm has vcpu-pin-set set to other than blank/default, check that there is an equivalent configuration of a sysconfig charm's cpu-range and reservation configuration setting and that these charms are deployed to the same set of machines.

Also provide a sanity check on number of vcpus isolated vs "assumed" max cpu count. Typical cpu counts are 10, 12, 16, 20, 24, 36, 40, 50, 56. Determine the smallest cpu count that the configured vcpu-pin-set range could fit within and then report a warning for further manual inspection if % cpus reserved is greater than 80%. For instance, if I have vcpu-pin-set of 1-9, assume cpu count is 10, then calculate that only cpu 0 is non-pinned, resulting in a calculation of 90% cpu reserved and should flag as a warning. Calculation might be: (num-cpus in range(s) / highest cpu number reserved, rounded up to one of above known cpu counts, or n+1 if higher than expected counts) and warn if > 0.8.

The intention of these checks is to trigger a level of scrutiny for sites employing cpu pinning.


Imported from Launchpad using lp2gh.

zxhdaze commented 4 months ago

(by ec0) OK, I think I grok what you're suggesting, and also have some additional ideas for things we could check to ensure consistent CPU pinning configuration.

Please see my proposal for handling configuration values which differ based on the configuration of other charms over on bug #1846136. Being able to also compare the CPU pinning between various deployed services (for example, for SR-IOV deployments where Neutron and Nova configuration need to match up).

The general implementation would be similar in my opinion - except we would also have to add a filter for calculating CPU pinning logic in addition to the conditional logic checks. Regardless of the source of the map, we need some logic to compare a given CPU pinning map (sysconfig, nova-compute) against each other to ensure there's no gaps.

The "assumed" max cpu count gives me pause, because I don't want to introduce code which will call out false-positives if for some reason we can't anticipate the lower cores need to be reserved and there's actually a ton of free cores beyond the reserved range. There is per-cloud configuration in juju-lint now however, so potentially we could add a configuration directive to specify the CPU core count, which would then activate these checks.

Thoughts?