Closed tserong closed 2 years ago
should we specifically capture these based on the
prometheus
andgrafana
roles? likewise, we appear to be missing a filter for theadmin
role too ..
I did it that way (as a catchall) in case there were any other roles I was forgetting ;-) My rationale is that we absolutely want the order to be: master, mon, mgr, storage, gateways, but after that we don't really care what order anything else is upgraded in (the only thing the 'admin' role does is make sure the ceph admin keyring is installed).
I mean this PR might be a better way to avoid missing any additional roles, but it also seems like we might also grab unrelated salt-minons in the list (e.g. nodes that are part of the salt cluster but not used by DeepSea)?
That shouldn't be a problem - the search criteria when getting the list of nodes from salt always includes "cluster:ceph", so we're only going to list nodes that the admin has decided are meant to be used by DeepSea.
I've just made one more small tweak to handle an annoying case if a node was down, where it printed out the "nodes to upgrade" line with no nodes listed after it.
Here's the output before that change:
# salt-run --log-level=warning upgrade.status
The newest installed software versions are:
ceph: ceph version 14.2.22-445-ga68959d39a6 (a68959d39a67faec1a7ace55e8c4327accc4a38c) nautilus (stable)
os: SUSE Linux Enterprise Server 15 SP1
Nodes running these software versions:
master.ses6-to-7p.test (assigned roles: admin, master, prometheus, grafana)
node2.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
node3.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
node4.ses6-to-7p.test (assigned roles: admin, storage)
Nodes running older software versions must be upgraded in the following order:
Unable to contact these nodes (node down or Salt minion inactive?):
node1.ses6-to-7p.test
Here's the output after that change:
# salt-run --log-level=warning upgrade.status
The newest installed software versions are:
ceph: ceph version 14.2.22-445-ga68959d39a6 (a68959d39a67faec1a7ace55e8c4327accc4a38c) nautilus (stable)
os: SUSE Linux Enterprise Server 15 SP1
Nodes running these software versions:
master.ses6-to-7p.test (assigned roles: admin, master, prometheus, grafana)
node2.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
node3.ses6-to-7p.test (assigned roles: admin, storage, mon, mgr)
node4.ses6-to-7p.test (assigned roles: admin, storage)
Unable to contact these nodes (node down or Salt minion inactive?):
node1.ses6-to-7p.test
When sorting nodes for display, the upgrade.status runner was only taking into account the nodes with roles 'master', 'mon', 'mgr', 'storage', 'mds', 'rgw', 'igw' and 'ganesha'. It wasn't handling the 'prometheus' or 'grafana' roles. If the latter two roles are assigned to nodes that also have one of the "main" roles, this is not a problem, but if prometheus and/or grafana are deployed on separate nodes by themselves (with no other roles), those nodes don't appear in the list of nodes to upgrade, leading one to think that everything is upgraded, when it's potentially not.
Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1195366 Signed-off-by: Tim Serong tserong@suse.com