canonical / juju-verify

https://launchpad.net/juju-verify
GNU General Public License v3.0
2 stars 7 forks source link

Review hacluster subordinates for cluster quorum #123

Open esunar opened 2 years ago

esunar commented 2 years ago

When a requested unit is an application with an hacluster subordinate, we need to ensure that there are enough remaining units of the same hacluster application such that removing the requested principal does not cause the pacemaker cluster to lose quorum and therefore stop the managed services and VIPs.

Requirements:

Output:

Actions required on hacluster charm:

Specific exclusions:


Imported from Launchpad using lp2gh.

esunar commented 2 years ago

(by xavpaice) added charm-hacluster because though the status action provides the cluster status, we need to know the hostname of the unit it just ran on. That information isn't provided by Juju, and for some providers (e.g. the openstack provider) we have no link between the machine hostname and the unit/machine/installation-id. The dns-name shows just an IP address.

Request for the charm therefore: add a 'hostname' field to the action output.

esunar commented 2 years ago

(by rgildein) I want to let you know, that there is another proposal [1], which change output of status action to provide more information about cluster health.

I believe that the hostname should be part of the output from juju status and a bug should be reported against juju.

esunar commented 2 years ago

(by aluria) I have filed bug 1918286 against Juju re: get a unit/machine hostname from JujuStatus.

esunar commented 2 years ago

(by rgildein) I forgot to insert the link 1.


esunar commented 2 years ago

(by aluria) Xav, your description looks good.

Only a comment about: """

juju-verify already discovers other principal units running within the same machine (and submachines).

If multiple units/machines are shared in the CLI, I think the approach should be to group by units of the same type, which is not supported yet:

Now, "juju verify shutdown --units unit/0 otherunit/2" fails when unit and otherunit do not use the same charm. That call should be treated as if 2 calls would have been triggered:

This comment is worth a different bug, though.

esunar commented 2 years ago

(by rgildein) The [[https://bugs.launchpad.net/bugs/1918286|bug#1918286]] bug was marked as duplicate of [[https://bugs.launchpad.net/bugs/1918204|bug#1918204]].

Also the [[https://bugs.launchpad.net/bugs/1918204|bug#1918204]] bug is now Fix Released.