esunar / test4

GNU General Public License v3.0
0 stars 0 forks source link

ensure every machine has a nrpe unit on it #14

Open esunar opened 1 year ago

esunar commented 1 year ago

Consider a model where a machine has several units deployed on top of it as lxds, but no principal itself. Although juju-lint will alert on principal charms lacking a nrpe subordinate, it will not report the lack of nrpe on a machine with no principal.

In addition to validating relations between nrpe and principal units, we should ensure that every machine in the model has a nrpe unit on it.


Imported from Launchpad using lp2gh.

esunar commented 1 year ago

(by aieri) on second thought, this affects more than just nrpe: if you have a machine with no principal, you will also be missing ntp / telegraf / etc.

Shall we perhaps have lists of "mandatory units" that all machines and/or containers must have?

esunar commented 1 year ago

(by ec0) Reviewing the code after the refactor, the current logic is that each machine does track the subordinates at a machine level, and the lint rules are checked against a set of subordinates on each machine. Have you seen this with the current code base, and do you have an example YAML that would show the behaviour you have seen, if it is still a problem with the latest snap?

esunar commented 1 year ago

(by aieri) I have re-tried linting my model with juju-lint 1.1.dev11+ge9499f8 and the problem persists.

json status output showing the issue is available here (internal link): https://private-fileshare.canonical.com/~aieri/lp1893272.json

Machines 18, 19, 20, and 23 (not an exhaustive list) have no principal charms deployed on them, but juju-lint is not throwing any warning about missing subordinates.

esunar commented 1 year ago

(by gabrielcocenza) Hi Andrea. Could you try running with the changes of this MR and see if it's working? https://code.launchpad.net/~gabrielcocenza/juju-lint/+git/juju-lint/+merge/422918

After running I could see that the apps canonical-livepatch, ceilometer-agent, hacluster-vault, landscape-haproxy, landscape-postgresql, lldpd, memcached, ntp, thruk-agent are missing relation with nrpe using "nrpe-external-master" endpoint.

It was also possible to see that the apps aodh, bcache-tuning, cinder-ceph, cloudstats, designate, designate-bind, dns-policy-routing, easyrsa, external-policy-routing, filebeat, gnocchi, heat, keystone-ldap, landscape-client, landscape-server, logrotate, neutron-openvswitch, neutron-openvswitch-sriov, telegraf, telegraf-prometheus were missing relation with nrpe using the "juju-info" endpoint because they don't have "nrpe-external-master"

This approach focus on relations instead of if the subordinate is present in a machine. I think this makes sense because a subordinate will just be deployed in a machine when the relation exists.