Josef-Friedrich / check_systemd

This systemd check for nagios compatible monitoring systems will report a degraded systemd to your monitoring solution. It can also be used to monitor individual systemd services and timers units.
https://check-systemd.readthedocs.io
GNU Lesser General Public License v2.1
29 stars 16 forks source link

Alternative for --ignore-inactive-state #31

Closed stephan48 closed 9 months ago

stephan48 commented 10 months ago

Hey,

I was using check_systemd 2.3.1 and looking to upgrade to 3.0.0 - sadly with the removal of the --ignore-inactive-state option. I have no idea how to realize one of my usecases:

This was the previous commandline: '/usr/lib/nagios/plugins/custom/pip-check_systemd' '--ignore-inactive-state' '--no-startup-time' '--unit' 'ansible-pull.service'

If i drop the option --ignore-inactive-state I get the error: SYSTEMD UNKNOWN: ValueError: Please verify your --include- and --exclude- options. No units have been added for testing.

How would i realize this for 3.0.0?

Would it be possible/desired to add further user visible(so they can enable it) debugging to the unit/timer finding logic? If yes i could create another ticket - another bug i am hunting is a simple always enabled/running bind9.service not being found by check_systemd with the same error.

Kind Regards, Stephan

Josef-Friedrich commented 10 months ago

Thank you for reporting this issue. I have added the option again. The option may have been removed by mistake.

The commit c0448ef71d390652998b764bb36a18fcb01cdcf3 has removed the argparse option, but the associated logic remained. Could you please try the latest commit?

Josef-Friedrich commented 10 months ago

Feel free to open a new issue for a --debug option

stephan48 commented 10 months ago

Hey,

this works with a catch....

(check_systemd-dev) stephan@auth:~$ check_systemd '--ignore-inactive-state' '--no-startup-time' '--unit' 'ansible-pull.service'
SYSTEMD OK - ansible-pull.service: inactive | count_units=226 data_source=cli startup_time=29.828 units_activating=0 units_active=140 units_failed=0 units_inactive=86

(check_systemd-dev) stephan@auth:~$ check_systemd  '--no-startup-time' '--unit' 'ansible-pull.service'
SYSTEMD OK - ansible-pull.service: inactive | count_units=226 data_source=cli startup_time=29.828 units_activating=0 units_active=140 units_failed=0 units_inactive=86

According to: https://github.com/Josef-Friedrich/check_systemd/blob/37e3c194a8b9121694964e03ccd792570e2a5fe3/check_systemd.py#L874

Without the option shouldn't it take the first else and show critical or does metric.value contain something else?

stephan48 commented 10 months ago

This appears to hold true to bullseye/bookworm systems - the few buster i still have shows a different outcome:

stephan@nemesis:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
stephan@nemesis:~$ /opt/check_systemd/bin/check_systemd --ignore-inactive-state  '--no-startup-time' '--unit' 'ansible-pull.service'
SYSTEMD OK - ansible-pull.service: inactive
stephan@nemesis:~$ /opt/check_systemd/bin/check_systemd  '--no-startup-time' '--unit' 'ansible-pull.service'
SYSTEMD CRITICAL - ansible-pull.service: inactive
stephan@nemesis:~$ /opt/check_systemd/bin/check_systemd --version
check_systemd 2.3.1

After some struggeling I got 3.0.0 to run:

check_systemd-dev) stephan@nemesis:~/dev/check_systemd$ ./check_systemd.py  '--no-startup-time' '--unit' 'ansible-pull.service'
SYSTEMD OK - ansible-pull.service: inactive | count_units=222 data_source=cli startup_time=66.796 units_activating=0 units_active=159 units_failed=0 units_inactive=63
(check_systemd-dev) stephan@nemesis:~/dev/check_systemd$ ./check_systemd.py --ignore-inactive-state  '--no-startup-time' '--unit' 'ansible-pull.service'
SYSTEMD OK - ansible-pull.service: inactive | count_units=222 data_source=cli startup_time=66.796 units_activating=0 units_active=159 units_failed=0 units_inactive=63
(check_systemd-dev) stephan@nemesis:~/dev/check_systemd$ ./check_systemd.py --version
check_systemd 3.0.0

Where this works again.

Josef-Friedrich commented 10 months ago

Thank you for your detailed review!

It would be helpful if you could post the output of the command systemctl list-units --all here as a comment with the ansible-pull.service. Only the relevant lines, of course, so that I can write test. Because this option has no tests yet.

stephan48 commented 10 months ago

Hey,

sorry for the delay once again :(

On a buster system this gives:

buster:~$ systemctl list-units --all | grep pull
  ansible-pull.service                                                        loaded    inactive dead      system configuration upgrade                                      
  ansible-pull.timer                                                          loaded    active   waiting   system configuration upgrade

$ python3 -V
Python 3.7.3

On a bullseye:

bullseye:~$ systemctl list-units --all | grep pull
  ansible-pull.service                                                                                           loaded    inactive dead      system configuration upgrade
  ansible-pull.timer                                                                                             loaded    active   waiting   system configuration upgrade

$ python3 -V
Python 3.9.2

On a bookworm:

bookworm:~$ systemctl list-units --all | grep pull
  ansible-pull.service                                                                 loaded    inactive dead      system configuration upgrade
  ansible-pull.timer                                                                   loaded    active   waiting   system configuration upgrade

$ python3 -V
Python 3.11.2

On buster I still have 2.3.1 deployed in the checks as the new version won't install from the repo(missing poetry). I guess the solution is finding the time to finally upgrade the systems :D

Kind Regards, Stephan

Josef-Friedrich commented 10 months ago

Thank you very much!

It should be possible to run the plugin without poetry. Poetry is only required for the dev environment. The only required dependency is nagiosplugin

pip install nagiosplugin
wget https://raw.githubusercontent.com/Josef-Friedrich/check_systemd/main/check_systemd.py
chmod a+x check_systemd.py
stephan48 commented 10 months ago

Hey,

yup that is what i did for the test :) For my check deployment i install the version of pip through for KISS reasons.

I wish you happy holidays & good new year, thank you for your quick responses and good work!

Kind Regards, Stephan