canonical / juju-lint

Run checks against a juju model
GNU General Public License v3.0
0 stars 3 forks source link

Juju lint-false positives for double nrpe #14

Closed zxhdaze closed 5 months ago

zxhdaze commented 5 months ago

This is basically https://bugs.launchpad.net/fce-templates/+bug/1855659 assigned to correct project.

Looks like there are some duplicate relations as we are taking hyper-converged architecture (nova-compute and ceph-osd are on the same physical host).

2019-12-06 15:42:28 [INFO] following subordinates where found on machines more than once: 2019-12-06 15:42:28 [ERROR] -> nrpe-host [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]

$ git grep 'nrpe-host' config/bundle.yaml | egrep 'nova-compute|ceph-osd' config/bundle.yaml: - [ nova-compute-kvm, nrpe-host ] config/bundle.yaml: - [ "ceph-osd:nrpe-external-master", "nrpe-host:nrpe-external-master" ] config/bundle.yaml: - [ "nova-compute-kvm:nrpe-external-master", "nrpe-host:nrpe-external-master" ]

We and up with following situation: Unit Workload Agent Machine Public address Ports Message ceph-osd/0 active idle 21 10.16.41.53 Unit is ready (6 OSD) nrpe-host/18 active idle 10.16.41.53 icmp,5666/tcp ready nova-compute-kvm/0 active idle 21 10.16.41.53 Unit is ready nrpe-host/20 active idle 10.16.41.53 ready

Machine State DNS Inst id Series AZ Message 21 started 10.16.41.53 sf-jkt001-hyperconverge020-rack003 bionic 01-03 Deployed

We can see that 2 units of nrpe-host application land on the same physical machine. This is not a bug as one of these is related to ceph-osd and the other one to nova-compute-kvm. Monitored services are chosen based on the principal relation and we have 2 principal charms collocated in here, each of these providing different set of Nagios checks.

Hence juju-lint is seems to be a false-positive alert as this is an expected situation.


Imported from Launchpad using lp2gh.

zxhdaze commented 5 months ago

(by vern) With most subordinate charms, you don't want them to be installed more than once on a machine. The nrpe charm is special in that you can related it to other charms on the same machine and it can enable additional checks.

This is a good thing and should not be called out by juju-lint.

zxhdaze commented 5 months ago

(by nobuto) There is still a confusion just because of this false positive. People may be surprised by the error and tend to remove relations even if those "duplicate" relations are necessary.

Just for the record:

[1 relation]

$ juju status --relations | grep nrpe-host: ceph-osd:juju-info nrpe-host:general-info juju-info subordinate

$ juju run-action --wait nrpe-host/2 list-nrpe-checks unit-nrpe-host-2: UnitId: nrpe-host/2 id: "18" results: checks: check-conntrack: /usr/local/lib/nagios/plugins/check_conntrack.sh -w 80 -c 90 check-disk-root: '/usr/lib/nagios/plugins/check_disk -u GB -w 25% -c 20% -K 5% -p / ' check-load: /usr/lib/nagios/plugins/check_load -w 32,16,8 -c 64,32,16 check-mem: /usr/local/lib/nagios/plugins/check_mem.pl -C -h -u -w 85 -c 90 check-swap: /usr/lib/nagios/plugins/check_swap -w 40% -c 25% check-swap-activity: /usr/local/lib/nagios/plugins/check_swap_activity -i 5 -w 10240 -c 40960 timestamp: Fri Jul 31 02:40:00 UTC 2020 status: completed timing: completed: 2020-07-31 02:40:01 +0000 UTC enqueued: 2020-07-31 02:39:57 +0000 UTC started: 2020-07-31 02:40:00 +0000 UTC

-> 6 checks

[2 relations]

$ juju status --relations | grep nrpe-host: ceph-osd:juju-info nrpe-host:general-info juju-info subordinate ceph-osd:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate

$ juju run-action --wait nrpe-host/2 list-nrpe-checks unit-nrpe-host-2: UnitId: nrpe-host/2 id: "20" results: checks: check-ceph-osd: /usr/local/lib/nagios/plugins/check_ceph_osd_services.py check-conntrack: /usr/local/lib/nagios/plugins/check_conntrack.sh -w 80 -c 90 check-disk-root: '/usr/lib/nagios/plugins/check_disk -u GB -w 25% -c 20% -K 5% -p / ' check-load: /usr/lib/nagios/plugins/check_load -w 32,16,8 -c 64,32,16 check-mem: /usr/local/lib/nagios/plugins/check_mem.pl -C -h -u -w 85 -c 90 check-swap: /usr/lib/nagios/plugins/check_swap -w 40% -c 25% check-swap-activity: /usr/local/lib/nagios/plugins/check_swap_activity -i 5 -w 10240 -c 40960 timestamp: Fri Jul 31 02:41:11 UTC 2020 status: completed timing: completed: 2020-07-31 02:41:12 +0000 UTC enqueued: 2020-07-31 02:41:11 +0000 UTC started: 2020-07-31 02:41:12 +0000 UTC

-> 7 checks

[3 relations]

$ juju status --relations | grep nrpe-host: ceph-osd:juju-info nrpe-host:general-info juju-info subordinate ceph-osd:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate nova-compute-kvm:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate

$ juju run-action --wait nrpe-host/2 list-nrpe-checks unit-nrpe-host-2: UnitId: nrpe-host/2 id: "22" results: checks: check-ceph-osd: /usr/local/lib/nagios/plugins/check_ceph_osd_services.py check-conntrack: /usr/local/lib/nagios/plugins/check_conntrack.sh -w 80 -c 90 check-disk-root: '/usr/lib/nagios/plugins/check_disk -u GB -w 25% -c 20% -K 5% -p / ' check-libvirtd: /usr/local/lib/nagios/plugins/check_systemd.py libvirtd check-load: /usr/lib/nagios/plugins/check_load -w 32,16,8 -c 64,32,16 check-mem: /usr/local/lib/nagios/plugins/check_mem.pl -C -h -u -w 85 -c 90 check-nova-compute: /usr/local/lib/nagios/plugins/check_systemd.py nova-compute check-swap: /usr/lib/nagios/plugins/check_swap -w 40% -c 25% check-swap-activity: /usr/local/lib/nagios/plugins/check_swap_activity -i 5 -w 10240 -c 40960 timestamp: Fri Jul 31 02:42:31 UTC 2020 status: completed timing: completed: 2020-07-31 02:42:32 +0000 UTC enqueued: 2020-07-31 02:42:29 +0000 UTC started: 2020-07-31 02:42:32 +0000 UTC

-> 9 checks

zxhdaze commented 5 months ago

(by ec0) So, I think generically a way to exempt subordinates from the multiple-placement check would address this. It would definitely need to be exception based, as the default situation with the majority of subordinate charms is that they expect not to be installed multiple time. NRPE is indeed an exception to this, NTP however, as an example, definitely does not expect to be installed multiple times.

Triaging as high.

zxhdaze commented 5 months ago

(by ec0) One question here, can you not simply relate nrpe-host to both nova-compute-kvm and ceph-osd, and use a single nrpe-host?

zxhdaze commented 5 months ago

(by ec0) *Reword: Can you not simply relate nrpe-host to the juju-info relation of either nova-compute-kvm or ceph-osd, and use a single nrpe-host? Which checks are missing in this situation, and is it possible to monitor those checking by relation via the external-master or local-monitors relations or similar? Just want to make sure we fix this in the right place - because it might also make sense to raise a bug against NRPE and/or ceph-osd/nova-compute to handle this situation more elegantly.

zxhdaze commented 5 months ago

(by aieri) Even if I'm 2.5 years late, I think it'd be useful to answer James' question in #5 for future travelers: yes, if an nrpe charm is related to two principals over their nrpe-external-master endpoints juju will instantiate two nrpe subordinates, regardless of whether those two principals are colocated or not.

As an example, the following will yield 2 principals and 2 subordinates on the same machine:

juju deploy charm1 --to 0
juju deploy charm2 --to 0
juju relate nrpe:nrpe-external-master charm1:nrpe-external-master
juju relate nrpe:nrpe-external-master charm1:nrpe-external-master