canonical / juju-lint

Run checks against a juju model
GNU General Public License v3.0
0 stars 4 forks source link

Check for number of OSDs per rack in ceph-osd status message #207

Open zxhdaze opened 6 months ago

zxhdaze commented 6 months ago

We are seeing some clouds where physically, compute/storage nodes are racked unevenly across racks, and while the compute service may be okay with the imbalance of hypervisors in each availability zone, ceph performance can be severely impacted by an imbalance of spindles/nodes.

The status message of ceph-osd notes how many OSDs are available from each unit:

i.e.

      message: Unit is ready (14 OSD)
      message: Unit is ready (8 OSD)
      message: Unit is ready (14 OSD)
      message: Unit is ready (8 OSD)
      message: Unit is ready (14 OSD)
      message: Unit is ready (2 OSD)

juju-lint should aggregate number of OSDS per zone and report the number of OSDS per zone and WARN if they are not identical, and ERROR if they are off by 20% or more.