Griesbacher / nagflux

A connector which copies performancedata from Nagios / Icinga(2) / Naemon to InfluxDB
GNU General Public License v2.0
65 stars 30 forks source link

Nagios high cpu - high cached log messages #31

Closed jowanw closed 7 years ago

jowanw commented 7 years ago

OMD-Labs nightly from 24-3 (Nagflux version: 0.40)

We are having issues with high cpu usage (similar to #3). After a while (couple of minutes) Nagios cpu usage goes up to 100%, and log's take a long time to load (Check_MK Events of recent 4 hours).

Nagflux Log:

^[[35m2017-03-28 14:56:23 Warn: connectToLivestatus timed out ^[[35m2017-03-28 14:57:03 Warn: connectToLivestatus timed out ^[[35m2017-03-28 14:57:43 Warn: connectToLivestatus timed out ^[[35m2017-03-28 14:57:43 Warn: Livestatus timed out... (Collector.queryData())

I have set the cache to 2 million, and this is filled up right away. We are checking 422 hosts w/ 7680 services

Griesbacher commented 7 years ago

Hmm, that's odd. Does your cpu usage also go up, if Nagflux is disabled? Which cache did you change?

jowanw commented 7 years ago

Changing [Livestatus] Version to 'Nagios' seems to have fixed it. I can now also see messages in grafana. Cached messages are back down to 140k in ~10 hour,s instead of 2 million in 2 minutes. Maybe the Icinga2 query isn't liked by nagios.

It might be a good idea to change the default to empty, so it can detect the right core.

The cache i change was the max_cached_messages in the nagios config for mk_livestatus.

Griesbacher commented 7 years ago

Yes the query differ, that's the reason why there is a switch in the config.

I'll change it in the OMD config, that was a mistake there.