Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
1.99k stars 570 forks source link

Pluggable Dynamic Inventory for Host & Service Auto-Discovery / Integration to Cloud & Containerized environments #6072

Open HariSekhon opened 6 years ago

HariSekhon commented 6 years ago

Feature Request to add Pluggable Dynamic Host and Service discovery.

This is a Critical Feature to remain relevant in the monitoring space as modern infrastructure become increasingly dynamic with the rise of Cloud and Containerization.

Although one can Puppet or similarly generate hosts and services configurations, this is such a slow and outdated thing to do by comparison to modern dynamic inventories, and doesn't really work for auto-scaling Cloud or dynamic Dockerized applications.

See Promtheus, Ansible and even RabbitMQ for examples of technologies that have all moved to dynamic inventory support.

An ex-colleague has argued with me that his last company (a large telco or hosting company) obsoleted all monitoring not Prometheus due to the dynamic nature of their infrastructure.

Of course Prometheus is no replacement for more sophisticated logic and API level monitoring that can be done via Nagios Plugins (eg. see the Advanced Nagios Plugins Collection I publish on GitHub for example).

However this does highlight an important point - Static config is dead. Even configuration management generated config is quite frankly awful to consider in a Dockerized world, and API based configuration is awkward and not nicely generalised compared to the modern dynamic inventory approach. The dynamic inventory approach is so powerfully important that it is akin to the flexibility and extensibility of nagios plugins itself, the single biggest advantage that Nagios-based monitoring systems have enjoyed and used to dominate the monitoring space. Any monitoring solution without dynamic discovery is going to become legacy and fade away, and as one of the most popular open source monitoring solutions, Icgina2 needs this badly.

Nagios XI, Shinken, Sensu, ZenOSS and Zabbix all have auto-discovery (the last 2 have had it for nearly a decade now that I can recall) - although these are all crude examples of how not to do it modern IT infrastructures eg. using NMap sweeps is very last decade. Sensu's dynamic agent registration / de-registration isn't bad but also isn't good for Docker environments where containers should not be running agents.

I propose Icinga2 follows the Ansible method of using dynamic inventory plugins that can be written in any language to integrate to any provider and executed to return dynamic inventory as a list of host and groups (against which a single check definition would automatically run against any matching groups) and then extending from there. This would allow a large re-use of existing integrations to a large number of Cloud and cluster manager technologies like Kubernetes out of the box without having to re-invent the wheel, you may just need to extend the wheel ever so slightly.

Regardless of method, IMO the following should be the priority order for dynamic inventory integrations:

  1. AWS EC2 + ECS
  2. Azure VM + AKS
  3. GCP GCE + GKS
  4. Kubernetes
  5. Consul
  6. Etcd
  7. OpenStack Nova
  8. CloudStack
  9. OpenShift
  10. Marathon
  11. DNS
  12. Digital Ocean
  13. FreeIPA
  14. Cobbler
  15. Foreman
  16. Spacewalk
  17. VMware ...

Most of these integrations are already available in the Ansible inventory scripts found here, Icinga2 simply needs to support executing them.

This is a huge opportunity to reignite and modernize or be left behind by those that do get comprehensive dynamic inventory support to all of the above technology platforms.

Thanks

Hari

formorer commented 6 years ago

On Mon, 05 Feb 2018, Hari Sekhon wrote:

Feature Request to add Pluggable Dynamic Host and Service discovery.

This is a Critical Feature to remain relevant in the monitoring space as modern infrastructure become increasingly dynamic with the rise of Cloud and Containerization.

Although one can Puppet or similarly generate hosts and services configurations, this is such a slow and outdated thing to do by comparison to modern dynamic inventories, and doesn't really work for auto-scaling Cloud or dynamic Dockerized applications.

See Promtheus, Ansible and even RabbitMQ for examples of technologies that have all moved to flexible pluggable integrations to the wider ecosystem of powerhouse infrastructure technologies.

An ex-colleague has argued with me that his last company (a large telco or hosting company) obsoleted all monitoring not Prometheus due to the dynamic nature of their infrastructure.

Of course Prometheus is no replacement for more sophisticated logic and API level monitoring that can be done via Nagios Plugins (eg. see the Advanced Nagios Plugins Collection I publish on GitHub for example).

However this does highlight an important point - Static config is dead. Even configuration management generated config is quite frankly awful to consider in a Dockerized world, and API based configuration is awkward and not nicely generalised compared to the modern dynamic inventory approach. The dynamic inventory approach is so powerfully important that it is akin to the flexibility and extensibility of nagios plugins itself, the single biggest advantage that Nagios-based monitoring systems have enjoyed and used to dominate the monitoring space for. Any monitoring solution without dynamic discovery is going to become legacy and fade away, and as one of the most popular open source monitoring solutions, Icgina2 needs this badly.

Nagios XI, Shinken, Sensu, ZenOSS and Zabbix all have auto-discovery (the last 2 have had it for nearly a decade now that I can recall) - although these are all crude examples of how not to do it modern IT infrastructures eg. using NMap sweeps is very last decade. Sensu's dynamic agent registration / de-registration isn't bad but also isn't good for Docker environments where containers should not be running agents.

I propose Icinga2 follows the Ansible method of using dynamic inventory plugins that can be written in any language to integrate to any provider and executed to return dynamic inventory as a list of host and groups (against which a single check definition would automatically run against any matching groups) and then extending from there. This would allow a large re-use of existing integrations to a large number of Cloud and cluster manager technologies like Kubernetes out of the box without having to re-invent the wheel, you may just need to extend the wheel ever so slightly.

Regardless of method, IMO the following should be the priority order for dynamic inventory integrations:

  1. AWS EC2 + ECS
  2. Azure VM + AKS
  3. GCP GCE + GKS
  4. Kubernetes
  5. Consul
  6. Etcd
  7. OpenStack Nova
  8. CloudStack
  9. OpenShift
  10. Marathon
  11. DNS
  12. Digital Ocean
  13. FreeIPA
  14. Cobbler
  15. Foreman
  16. Spacewalk
  17. VMware ...

Most of these integrations are already available in the Ansible inventory scripts found here, Icinga2 simply needs to support executing them. you are looking for icingaweb2 director (https://github.com/Icinga/icingaweb2-module-director).

Alex https://github.com/Icinga/icinga2/issues/6072

HariSekhon commented 6 years ago

@formorer

Thanks for the reply, I did look at Icinga Director but I didn't see this as a substitute for what I am describing, and it is also not retro-fittable to existing hand written Icinga2 deployments.

I also cannot see how its extensibility and number of integrations to all the major modern platform compares to what I have linked to in Ansible.

Spending more time with the other tools, especially looking at Prometheus and Ansible their integrations are so much more fundamentally supported and pluggable. I don't see Director as being comparable but I'd be happy for you to educate me on that.

dnsmichi commented 6 years ago

This is a huge topic while the feature request leaves many open things for discussion. Imho you want auto-discovery by any sorts possible. Be it via plugin execution (would need a new plugin API), runtime object creation including apply rules, etc.

I don't think that one should just use Ansible scripts here, but develop a common solution. To me it seems you haven't tried the capabilities of existing APIs yet, including Icinga 2's and the Director import rules.

The statement "static configuration is dead" also is quite harsh, as it is not true. The containerized world changes a lot and might not need such static objects. Still, there's a lot more services and "classic" monitoring around which relies on configuration management. So to speak, you cannot drop that.

Main question - what kind of inventory, storage, object registration, etc. do you see in Icinga 2 from a technical perspective?

HariSekhon commented 6 years ago

@dnsmichi

Thanks for the reply.

"Static configuration is dead" is the message I'm getting from my best engineering colleagues across companies, environments and countries for a while now. Sorry, I didn't mean for it to sound harsh! :-O

People are citing having already moved monitoring products over this as it's such a critical feature, and I expect this to continue and accelerate as the world marches towards Containerization and Cloud. I believe we'll see more and more of this in DevOps tech presentations, especially when talking about technologies like Prometheus which are gaining traction off dynamic inventory advantage.

Of course static configuration should still be possible, and especially mix-and-match of static and dynamic inventory at the same time, as you can see in Prometheus and Ansible.

I did actually go through both the documentation for the Icinga2 API and Director before raising this (I like the API) but IMO this is not really comparable to what I've requested to regain feature parity with the other systems mentioned as it lacks extensible simplicity (like Nagios Plugins and Ansible inventory plugins). The API for example requires too much custom integration to be pushed from each outside solution, for which there may not even be hooks to do so or may require repeated customised scripting in every single environment - imagine moving companies and having to do that all over again :-( - rather than just enabling a pre-existing inventory plugin pull update for whichever infrastructure you happen to be running, as seen in the other technologies.

For Director, which has some of this capability, I see the product-like approach but the pluggable community non-GUI approach seems to gain more traction than the product-based approach, historically see Nagios Plugins and Ansible inventory plugins, which already support everything on the list above and are easy to contribute, write your own, modify etc, dropped in to a directory and set to execute - an extremely powerful extensibility. If you look at the sheer amount of Ansible dynamic inventory integrations to all sorts of different infrastructures, Cloud providers, Containerization Cluster Managers like Kubernetes etc compared to Icgina2 Director's very limited set of integrations - it seems to support this extensible community drive approach too. Ansible inventories are simply single file python scripts of a few hundred lines that have been contributed and absorbed in to the main Ansible project. It's the difference between having to do all the work yourself (hard, will always lag behind) or adopting the Ansible approach and letting the whole world help you fill in the blanks for you.

If you read all the documentation links to each product that I put in the original post, it should help illuminate the style, approach and implementation details of what I'm requesting to regain feature parity.

I propose following the Ansible style inventory plugins (reusing them might be best as it would give immediate integration to the whole world of infrastructure integrations) where hosts and groups are periodically returned (should be configurable) and then apply rules can be pre-set up to apply service checks to all hosts or groups matching certain regexes or other features. This base could then be extended over time to support extracting more information provided as optional Ansible per host vars on each host line against which more sophisticated apply rules could then be configured. Icgina2 Director should probably also add this dynamic inventory plugin support but I think it's much more valuable in the base Icinga2 as Director cannot be retrofitted to existing installations and having worked in several high scale environments they tend to avoid the product approach as it generally limits flexibility.

IMO this is the single most important innovation in the monitoring space right now by no small margin and I would love to see Icinga2 core adopt this.

Thanks very much for reading and keep up the good work guys :)

Cheers

Hari

dnsmichi commented 6 years ago

@bobapple please have a look here :)

shleizn commented 5 years ago

@dnsmichi Hello! Could you please explain if there any progress on this feature request?

dnsmichi commented 5 years ago

@shleizn nothing new, except for the bits that I don't really see this as part of the core itself. The Director with jobs and import syncs in addition with auto-discovery providers for AWS, etc. serve a good basis already, did you try that already?

shleizn commented 5 years ago

@dnsmichi Thanks for your reply! No I didn't try the Director with jobs and import syncs yet, but I will do it in the nearest future. Furthermore I didn't know about these features of the Director as I haven't look into it deeply yet. Thank you for giving me the direction.

stevie-sy commented 5 years ago

Hi @shleizn, I found this feature request and want to tell you our experience. Only in a short way: We use the director in our company and it's a very cool tool and it helps us a lot with our configuration.
Of Course at the moment it can not everything like you can do with the icinga config files (described here: https://icinga.com/docs/icinga2/latest/doc/17-language-reference/. But we found some ways and workarrounds for us. We import the hosts from our internal CMDB and other systems. Then we create some rules for Service. e.g. if operating System is "Windows" the hosts get this services, if the ip is in the dmz, the hosts get that servicec, if some flag is set, then the hosts get other services etc. The Directory does the Magic for us, so that we get automatic a configuration. This works very fine for us. If this kind of automation is like what you're looking for, the the director is also a possible tool for you. But in our experience it needs a lot of training like try and error.

If you like more automation: I remember to the Icinga Camp in Berlin. There was a talk about the HashiCrop https://www.hashicorp.com/ Maybe this is something for you. If you want to have more experience, maybe we should talk in the community channel.

dnsmichi commented 5 years ago

Thanks Stefan. Yep, https://community.icinga.com has a broader audience and it also is easier to discuss ideas. The talk you're referring to is https://www.youtube.com/watch?v=VSeqxLi_txo - now let's move over to Discourse :)

shleizn commented 5 years ago

Stefan, Michael, thank you for the plenty of information. I will learn it up and, in case of difficulties that I can not solve myself, reach you in the community channel.

micw commented 4 years ago

I stumbled across this while looking for a replacement for our old, home-brewd monitoring solution (we have both, static and dynamic services and use ansible to generate nagios configs from the inventory). From what I've read in the docs, the requirement could be implemented using the API + something else that does the discovery. The discovery part does not necessarily need to be implemented within icinga, it could also be a standalone api client. That client could add newly discovered objects and remove stale ones.