dragonresearch / rpki.net

Dragon Research Labs rpki.net RPKI toolkit
53 stars 30 forks source link

What do we need from nagios plug-ins? #676

Open sraustein opened 10 years ago

sraustein commented 10 years ago

We've discussed wanting nagios plug-ins to augment control-panel alerts and so forth in the GUI. General idea is that nagios should pro-actively tell us things rather than waiting for somebody to run the GUI and look.

Writing nagios plug-ins looks straightforward (famous last words). There's even a python library that claims to provide a useful framework (unclear how much it helps, try and see I guess). The question is: what, specifically, do we want to monitor?

Do we want this to duplicate or replace the current expiration cron job?

Do we want nagios monitoring rcynic's output for rp-only sites?

General concept is clear enough, it's the details that confuse me. Suggestions welcome, the more concrete the better.

Trac ticket #664 component rpkid priority major, owner , created by sra on 2014-01-07T19:36:48Z, last modified 2014-05-21T22:18:36Z

sraustein commented 10 years ago

{{{ portmaster sysutils/py-nagiosplugin }}}

or

{{{ pip install nagiosplugin }}}

pip search nagios shows several zillion other packages. apt-cache search shows no Python Nagios packages, which seems unlikely.

Then there are things like nagios2trac and djagios which sound like crossing the streams....

Trac comment by sra on 2014-05-21T21:44:36Z

sraustein commented 10 years ago

After a quick wander through Nagios's own doc and doc for the Python nagiosplugin library, I find myself back at the topic question of this ticket: what, exactly, do we want to monitor? A lot of the plugin examples seem to be about things like performance data, load average computations, using cookies, and so forth. Easy to get lost in details.

Things I can think of (a few of which may be covered by other plugins we don't have to write):

No doubt many others. Pretty open-ended, as core mission seems to be "please monitor every single freaking moving part and tell me when any of them look broken", and there are a lot of moving parts. Again, easy to get lost in (different) details.

Some of the tasks listed above (and many related not listed) require IRDB access to pull BPKI keys and so forth to talk to daemons.

Some of the tasks above may require running rsync, or at least parsing rcynic's output, or at least reading a pre-parsed version of rcynic's output provided by something else. Running rsync directly from a plugin is almost certainly a bad idea.

I suspect this is going to end up looking like a framework program with a long list of commands. Framework sets up Nagios and BPKI environment stuff, parses command line using usual argparse subparser hack, dispatches to handler for some specific thing to be monitored. Then we write ten zillion of those little handlers, sharing code in libraries where practical. At least, I have no better plan; suggestions welcome.

So maybe first deliverable for this feature request is basic framework and handlers to monitor a few specific important things?

Not yet clear to me what environmental constraints are for Nagios plugins, in particular how long they're allowed to run before Nagios decides they've gone bananas, how one structures a pile of code intended to monitor ten zillion aspects of one service/package/box. More reading to do, no doubt.

Trac comment by sra on 2014-05-21T22:14:15Z

sraustein commented 10 years ago

https://nagios-plugins.org/doc/guidelines.html

Trac comment by sra on 2014-05-21T22:18:36Z