Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Stopped saving to Influxdb after update #6483

Closed pillbug22 closed 6 years ago

pillbug22 commented 6 years ago

Greetings,

We noticed after our standard update/downtime window this past weekend, that graphs within grafana and the "mini-graphs" displayed within Icinga Web 2 pages (powered by grafana) have not been updated since we shutdown the server for the downtime.

Running simple queries against InfluxDB directly from the console shows that data is not being saved to Influxdb (ruling out an issue with grafana). Using something like SELECT * FROM hostalive where hostname = '<FQDN>' and time > (now() - 4d) we can verify there is data existing in series from before the downtime, but not after.

All checks themselves are functioning normally, including email notifications. Appears to happen to all checks, not limited to any specific host or service.

Expected Behavior

Check results should be saved to influxdb (for then displaying via grafana)

Current Behavior

Check results are not being saved to influxdb

Possible Solution

?

Steps to Reproduce (for bugs)

Run checks as usual

Context

Not able to visually display trending/historical check results via Icinga Web 2 or grafana

Your Environment

Copyright (c) 2012-2018 Icinga Development Team (https://www.icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

Application information: Installation root: /usr Sysconf directory: /etc Run directory: /run Local state directory: /var Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid

System information: Platform: Ubuntu Platform version: 18.04.1 LTS (Bionic Beaver) Kernel: Linux Kernel version: 4.15.0-29-generic Architecture: x86_64

Build information: Compiler: GNU 7.3.0 Build host: 8213f0f5ca15


Checking /var/log/apt/history.log, 

`icinga2 icinga2-bin icinga2-common icinga2-doc icinga2-ido-mysql icingacli icingaweb2 icingaweb2-common icingaweb2-module-doc icingaweb2-module-monitoring`

were all updated immediately before the downtime. This was the afternoon of July 21.

Then on July 24,

`icinga2 icinga2-bin icinga2-common icinga2-doc icinga2-ido-mysql`

were all updated again in troubleshooting efforts

InfluxDB version results:

Influxdb-Build: OSS Influxdb-Version: 1.5.2


* Operating System and version:

Ubuntu 18.04, Linux 4.15.0-29-generic on x86_64

* Enabled features (`icinga2 feature list`):

Disabled features: compatlog debuglog elasticsearch gelf graphite livestatus opentsdb perfdata statusdata syslog Enabled features: api checker command ido-mysql influxdb mainlog notification


* Icinga Web 2 version and modules (System - About):
-- Icinga Web 2: 2.6.0
-- businessprocess: 2.1.0
-- director: 1.3.1
-- globe: 1.0.4
-- grafana: 1.1.10
-- map: 1.0.3
-- monitoring: 2.6.0

* Config validation (`icinga2 daemon -C`):

[2018-07-25 09:59:34 -0500] information/cli: Icinga application loader (version: r2.9.1-1) [2018-07-25 09:59:34 -0500] information/cli: Loading configuration file(s). [2018-07-25 09:59:34 -0500] information/ConfigItem: Committing config item(s). [2018-07-25 09:59:34 -0500] information/ApiListener: My API identity: dbsrvmon02.legacyconsulting.net [2018-07-25 09:59:35 -0500] warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere! [2018-07-25 09:59:35 -0500] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 23:1-23:48) for type 'Notification' does not match anywhere! [2018-07-25 09:59:35 -0500] warning/ApplyRule: Apply rule 'backup-downtime' (in /etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere! [2018-07-25 09:59:35 -0500] warning/ApplyRule: Apply rule 'apt' (in /etc/icinga2/conf.d/apt.conf: 1:0-1:18) for type 'Service' does not match anywhere! [2018-07-25 09:59:35 -0500] warning/ApplyRule: Apply rule 'load' (in /etc/icinga2/zones.d/monitoringserver.FQDN/services.conf: 1:0-1:19) for type 'Service' does not match anywhere! [2018-07-25 09:59:35 -0500] warning/ApplyRule: Apply rule 'procs' (in /etc/icinga2/zones.d/monitoringserver.FQDN/services.conf: 8:1-8:21) for type 'Service' does not match anywhere! [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 456 Services. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 13 ServiceGroups. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 InfluxdbWriter. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 15 HostGroups. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 FileLogger. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 NotificationComponent. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 2 NotificationCommands. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 599 Notifications. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 IcingaApplication. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 71 Hosts. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 ApiListener. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 Comment. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 CheckerComponent. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 32 Zones. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 ExternalCommandListener. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 31 Endpoints. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 ApiUser. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 2 UserGroups. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 1 IdoMysqlConnection. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 227 CheckCommands. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 6 TimePeriods. [2018-07-25 09:59:35 -0500] information/ConfigItem: Instantiated 4 Users. [2018-07-25 09:59:35 -0500] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2018-07-25 09:59:35 -0500] information/cli: Finished validating the configuration file(s).



* If you run multiple Icinga 2 instances, the `zones.conf` file (or `icinga2 object list --type Endpoint` and `icinga2 object list --type Zone`) from all affected nodes.
We are running a single server.
dnsmichi commented 6 years ago

Updates to InfluxDB itself involved here?

In any case, you should enable the debug log and troubleshoot further. Details: https://www.icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#enable-debug-output

mcktr commented 6 years ago

Did you replaced the influxdb.conf during the update? Please check if the feature is still fully configured.

pillbug22 commented 6 years ago

There isn't anything listed in the apt update log that shows Influxdb was updated. Looking at last modified date on influxdb.conf , it was last edited on July 17 (4 days before the issue started).

However, checking Influxdb.conf, it's all stock - everything there is commented out and with default values. There is also a Influxdb.conf.dpkg-old that has the old config values.

Made the dpkg-old conf file the active config, restarted Icinga2, and starting to see so graph data now. Haven't gone through all hosts/services, but good news so far.

Seems that maybe the influxdb.conf file had been updated by a previous update, but the configured values stayed in running memory until the reboot this past weekend (which caused the new/blank config to be loaded).

We'll be sure to add a check on .conf files to our troubleshooting documentation. Thank you for the assist!

dnsmichi commented 6 years ago

Sounds really weird. I guess this resolved itself now, so I'm closing here.

pillbug22 commented 6 years ago

Yes, so far things are still working well after replacing the infuxdb.conf

I agree, the "weird" part to me is that looking at the /var/log/apt/history.log , there were some updates on July 6 (nothing listed for Icinga or Influxdb), and then no more updates installed until July 21 (this past Saturday when the issue began). Based on file modified timestamp, It appears the new/blank influxdb.conf file was created on July 17, right in the middle of the 2 updates.

DerEffi commented 6 years ago

Hey there, I'm facing a similar issue. I'm a relative newcommer with monitoring servers in gerneral, but the current behavior is the same: no data is written in the influxdb from icinga.

Reinstalling Influxdb or replacing the influxdb.conf file with the default one (even though I haven't changed anything) didn't help. The Database return for show measurements in the icinga2 database is completly empty. The Features InfluxDB and perfdata are enabled and the Icinga2 Log says:

[2018-11-14 15:32:46 +0100] information/InfluxdbWriter: 'influxdb' started.

But from there on in the log file I've never heard anything about InfluxDB (Or the InfluxDBWriter or so). It's like Icinga doesn't care at all about Influx or the anabled feature except this one log above, also no error message or anything like that.