Icinga / icinga-core

Icinga 1.x, the old core (EOL 31.12.2018)
GNU General Public License v2.0
45 stars 27 forks source link

[dev.icinga.com #6480] Intermittent command file latency after Icinga reload #1492

Closed icinga-migration closed 6 years ago

icinga-migration commented 10 years ago

This issue has been migrated from Redmine: https://dev.icinga.com/issues/6480

Created by jerdmann on 2014-06-13 14:08:47 +00:00

Assignee: (none) Status: New Target Version: Backlog Last Update: 2015-05-18 12:18:02 +00:00 (in Redmine)

Icinga Version: 1.11.1
OS Version: Centos 6.5 x64

We have a distributed Icinga environment connected together using nsca-ng, and we noticed that every ~5-6 times we do a 'service icinga reload' on the central Icinga server, we see about 30 seconds of the below syslog activity. From this point on, all passive checks submitted to the external command file suffer about 30 seconds of latency (this shows in the latency stats in icingastats output as well).

Jun 10 11:01:12 ttnet-ch-icinga-2 icinga: Event loop started... Jun 10 11:01:13 ttnet-ch-icinga-2 icinga: External command error: Malformed command

Our workaround is to just restart Icinga rather than reload. However, it would be cool if this was fixed as reloads are better/faster than restarts (eg. sighup vs sigkill? not really sure...).

Also, I found the below bug in the tracker which might be the same problem we're seeing (and much more informative to boot) ? We changed the default 4096 external command buffer slots to 8192. We have enough headroom to change it back though.

https://dev.icinga.org/issues/3899


Relations:

icinga-migration commented 10 years ago

Updated by mfriedrich on 2014-06-13 14:15:45 +00:00

icinga-migration commented 10 years ago

Updated by mfriedrich on 2014-06-13 14:19:53 +00:00

The reload isn't really a reload as it blocks the entire check procedure, and the 1.x architecture is single threaded after all.

It's properly implemented in Icinga 2 where are a new child process is forked, validating the configuration, and on success telling the parent to shutdown, taking over.

The core 1.x code cannot be touched that way, there's too much dependencies (neb modules, etc) involved. Though if you happen to have a patch around, the community may test it. Though I doubt that it's worth the hassle given the decision to move forward with a rewrite from scratch with Icinga 2.

icinga-migration commented 10 years ago

Updated by jerdmann on 2014-06-16 14:40:15 +00:00

Cool, sounds good. We have a workaround so we're good in the meantime. Thanks for the info!

icinga-migration commented 10 years ago

Updated by mfriedrich on 2014-06-16 19:27:58 +00:00

I'll leave it open as for further users feedback or patches. But i doubt that it can be implemented properly without breaking all the existing addons using their own threads already (livestatus, mod_gearman). That lack of control by the core itself was one of the reasons for the entire rewrite in icinga2.

icinga-migration commented 10 years ago

Updated by mfriedrich on 2014-10-24 22:27:26 +00:00

icinga-migration commented 9 years ago

Updated by mfriedrich on 2015-03-12 19:43:43 +00:00

icinga-migration commented 9 years ago

Updated by berk on 2015-05-18 12:18:02 +00:00