Ensure upgrade.status lists nodes with non-ceph roles (bsc#1195366)

tserong commented 2 years ago

When sorting nodes for display, the upgrade.status runner was only taking into account the nodes with roles 'master', 'mon', 'mgr', 'storage', 'mds', 'rgw', 'igw' and 'ganesha'. It wasn't handling the 'prometheus' or 'grafana' roles. If the latter two roles are assigned to nodes that also have one of the "main" roles, this is not a problem, but if prometheus and/or grafana are deployed on separate nodes by themselves (with no other roles), those nodes don't appear in the list of nodes to upgrade, leading one to think that everything is upgraded, when it's potentially not.

Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1195366 Signed-off-by: Tim Serong tserong@suse.com

tserong commented 2 years ago

should we specifically capture these based on the prometheus and grafana roles? likewise, we appear to be missing a filter for the admin role too ..

I did it that way (as a catchall) in case there were any other roles I was forgetting ;-) My rationale is that we absolutely want the order to be: master, mon, mgr, storage, gateways, but after that we don't really care what order anything else is upgraded in (the only thing the 'admin' role does is make sure the ceph admin keyring is installed).

I mean this PR might be a better way to avoid missing any additional roles, but it also seems like we might also grab unrelated salt-minons in the list (e.g. nodes that are part of the salt cluster but not used by DeepSea)?

That shouldn't be a problem - the search criteria when getting the list of nodes from salt always includes "cluster:ceph", so we're only going to list nodes that the admin has decided are meant to be used by DeepSea.

tserong commented 2 years ago

I've just made one more small tweak to handle an annoying case if a node was down, where it printed out the "nodes to upgrade" line with no nodes listed after it.

Here's the output before that change:

# salt-run --log-level=warning  upgrade.status
The newest installed software versions are:
  ceph: ceph version 14.2.22-445-ga68959d39a6 (a68959d39a67faec1a7ace55e8c4327accc4a38c) nautilus (stable)
  os: SUSE Linux Enterprise Server 15 SP1

Nodes running these software versions:
  master.ses6-to-7p.test (assigned roles: admin, master, prometheus, grafana)
  node2.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
  node3.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
  node4.ses6-to-7p.test (assigned roles: admin, storage)

Nodes running older software versions must be upgraded in the following order:

Unable to contact these nodes (node down or Salt minion inactive?):
  node1.ses6-to-7p.test

Here's the output after that change:

# salt-run --log-level=warning  upgrade.status
The newest installed software versions are:
  ceph: ceph version 14.2.22-445-ga68959d39a6 (a68959d39a67faec1a7ace55e8c4327accc4a38c) nautilus (stable)
  os: SUSE Linux Enterprise Server 15 SP1

Nodes running these software versions:
  master.ses6-to-7p.test (assigned roles: admin, master, prometheus, grafana)
  node2.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
  node3.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
  node4.ses6-to-7p.test (assigned roles: admin, storage)

Unable to contact these nodes (node down or Salt minion inactive?):
  node1.ses6-to-7p.test

SUSE / DeepSea

Ensure upgrade.status lists nodes with non-ceph roles (bsc#1195366) #1885