Open tobijb opened 9 years ago
I wholeheartedly agree; however it seems to me that most of these are currently outside of Debian repositories. Sure, they probably have their own APT repositories which can be added to sources.list, however by doing that we end up with Ubuntu PPAs all over again.
I would be in favor of pushing the software to Debian proper, all the way through unstable, testing to jessie-backports if possible and needed. Mail the authors and ask them if they could do that, for example through Debian Mentors repository. Adding software to Debian gives us proper integration with the rest of the system and standardized code base. At the same time I would give priority to alternatives already in Debian if they are viable.
I can see where you're coming from. Perhaps this is where something like "debops-extras monitoring" starts to come to life?
Sure, main playbook might need to be split at some point in the future due to number of roles, however currently playbooks are very "rigid" and require all roles to be present in order to function. I'm waiting for Ansible v2 to decide what to do about it.
I have been doing Monitoring for a couple of years now and Check_MK with Icinga works well for me and my/our customers. The packages in jessie are quite recent and should do the trick. I have not tested the check-mk-multisite package in Debian in detail yet, because I still install Check_MK from source.
I'm leaning towards Icinga instead of Nagios due to licensing issues. Check-MK can be used by LibreNMS, so that's a plus.
@ypid : If you are looking for a way to integrate Check_MK with DebOps, I already hacked together something at ansible-checkmk_agent. I'm using it as custom role in DebOps with a manually setup Icinga server.
Unfortunately I still didn't have time to try the role with the "official" DebOps LibreNMS setup...
@ganto your role looks really nice, thanks for the hint.
TL;DR I compared LibreNMS with Nagios/Icinga/Check_MK/PNP4Nagios. The later setup which I am using for some time now appears more mature, is highly configurable and better fits my use case. I will keep using it :wink:
I had some time today to checkout LibreNMS which is supported by DebOps. Note that I did only get into LibreNMS for one day, so I have no long-term experience with it yet. First of all, LibreNMS fits nicely from a moral/license point of view and I am grateful to Paul Gear for creating it based on Observium. As I come from an Nagios/Icinga/Check_MK/PNP4Nagios background I compared it with what I know and like.
LibreNMS is a nice project which I think is very well suited for telcos (for which Adam Armstrong originally created Observium). It can surly also be used to do server monitoring but I think there are other/better tools out there for this.
The real advantage of LibreNMS is its autodetection of network devices but I think that might not be number one priority in DebOps as you guys probably have a CMDB or also manage your network stuff with Ansible so you don’t really need autodetection. At least that is what I would do when I where responsible for network also :wink:
debops.icinga
, debops.check_mk
, debops.pnp4nagios
, debops.rrdcached
(or try to use something else here. Would need to be evaluated) and maybe debops.nagvis
debops.check_mk_agent
against all hosts.debops.check_mk
role to all hosts. The role then configures the host in the monitoring system(s) as to be monitored and triggers Check_MK to reinventorize the new hosts to find checks to run against them, then reload the core with the updated config.What do you guys think?
Maybe there is time next year to work on that :smile:
Hi everyone
As I have currently the task to setup new monitoring servers with Icinga/Check_MK at my day work, I was evaluating and experimenting a lot with Ansible and Check_MK recently. Also I'm already running two mostly manually managed Icinga/Check_MK installations (without WATO) in a Debian and in a RHEL-based environment since several years. There are a few issues which should be considered when using Ansible:
Software source: There are various packages available for the Check_MK server components:
When choosing the first approach, you have to do a lot of configuration fiddling to end up with a result similar to what check-mk-raw
or omd
gives you. The latter two options give you a handy tool also called omd which allows you to easily instantiate new monitoring sites, test-migrate to new versions and so on. After being sceptical to loose control at the beginning, I start to like it as it really includes some nice features that a "classical" setup cannot give you.
After first trying with the upstream packages, experimenting with the OMD distribution I finally ended up with check-mk-raw
now and opted for a semi-automated configuration management. Ansible would only setup the initial configuration and client management. After that the monitoring users will be responsible for the fine tuning of their checks. The environment I'm doing this for contains about 700 hosts with currently ~55'000 checks (only for the Linux service).
I still need some weeks to clean up the rough edges in my Ansible code, but then it shouldn't be too difficult to adjust the role(s) for DebOps. I will let you know, once I have something to show.
Btw. here another link with some OMD advertisement: Best Monitoring Solution 2015 - OMD (Open Monitoring Distribution)
@ypid and @ganto, thank you for excellent writeups! It seems that some monitoring solutions have the server-side covered pretty well, and it would be interesting to look into client integration in DebOps first, so that for example hosts managed by DebOps can be easily integrated into existing OMD installation. @ganto do you think that @ypid's check_mk_agent
role https://github.com/debops-contrib/ansible-checkmk_agent can be used with OMD?
The ansible-checkmk-agent
role is originally written by me ;-) I use it in the Debian environment (partly managed by DebOps) I mentioned above. It's still a bit rough, but usable.
If you then also have the Check_MK server part with WATO, this would simplify a lot, as you can download the matching agent release from WATO (instead of using the upstream deb) and also make better use of the large number of agent plugins that are all available via public URL from WATO (instead of downloading from the upstream git).
@ganto Ups, my bad, I give back credit where it's due. :-) Good to hear that it's usable. Right now I don't have any installation to use the agent against, maybe in the future I'll check it out.
@ganto Nice that you are working on that! I did look into OMD but I thought it might be to much magic for using it right now and proposed to go with the software packaged by Debian instead. But what works best is easier to figure out when actually trying it so I am looking forward to seeing your work.
About the automation of setting up additional hosts also in the monitoring system I think it could make sense to split the sections of configuration in /etc/check_mk/conf.d
to either CM or dedicated admins. So you could let Ansible manage your servers and your Network admins can manage there devices via WATO.
@ypid : ya, I felt so too at the beginning. But after trying it for a few weeks now, I can only say positive things about (at least) the omd command which is also part of check-mk-raw
.
With the OMD package however, I'm not so happy. At least not for a production setup. The current stable release 1.30 contained checkmk-1.2.6p12 which had some show stopper bugs for my setup. As it is an all-in-one package you cannot easily update to a newer checkmk release without creating your own fork of the package. Also I'm not yet confident enough in how long this still will be maintained, as the entire package with all possible alternatives is quite complex and OMD 2.x with Icinga 2 (without Check_MK) is being tested for a while now. To be honest, I didn't fully figure out the differences between OMD 1.x and Check_MK RAW yet, except, that I feel the latter includes less unnecessary cruft (for my use case), is more up-to-date and better documented.
About splitting the configuration between Ansible and WATO: WATO already stores its configurations in a wato
subdirectory of /etc/check_mk/conf.d
, even separated into individual files per topic. That's pretty. However, I didn't try yet if WATO is able to display (read-only) what was configured outside of its configuration directory and I think it won't be easy (but still possible with some permissions juggling) to prevent someone from adding server checks in WATO in case this would be Ansible's responsibility.
OK sounds good. I only tried the whole OMD package and remembered that its support in Debian stable (jessie) was beta.
About WATO, yes it stores its config in wato
directory. When you mean displaying things like global settings then yes. WATO will not show you hosts configured outside of the wato
directory. We can surly manage to get something working here. It might even be possible to read in the Python configuration variables back into Ansible and then template the changed configuration back so that admins and Ansible could both manage all the hosts. I am already generating check_mk configuration files with a script: https://github.com/hamcos/monitoring-scripts/blob/master/check_mk/wato_cvs_import.py
but some better "API" would definitely come in handy …
About the permission juggling, WATO usually does not like that and will show you/your users stack traces when trying to change the specific file with WATO :wink:
@ganto Just curious, how is it going?
Thanks for reminding me. Actually, I have a quite sophisticated role for Red Hat done. It's a bit of hack at many places, so I don't dare to release it. However, I'm cleaning it up now and adjust it for DebOps. You can find my progress at debops-contrib/ansible-checkmk_server.
I'll definitely still need some help later on. Hang on...
Thanks for the update :+1:
@ganto and others, how do you see CheckMK in 2019? Still using it? I am still running it and find it unmatched for network and infrastructure monitoring.
I also checked out Prometheus which I find pretty strong in cloud and application monitoring but I will not run it for now because it does not fit into my environment.
Provide out of the box monitoring framework. Perhaps using the following?
Server and Service Monitoring: Sensu System Metrics: Collectd App Metrics: Statsd Metrics visualization: Grafana Metrics Storage and Collection: Influxdb Alerts and Notification routing: Sensu Integrations: Pagerduty, Hubot...?