Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
1.99k stars 573 forks source link

Replayed Events do not appear in history #7009

Closed jonaschl closed 4 years ago

jonaschl commented 5 years ago

Expected Behavior

When a client node lost network connectivity, and the replay log feature is enabled, state changes (from OK to CRITICAL for example) should appear in the history tab of the service in icingaweb.

Current Behavior

Replay log seems to work in my environment, so the file C:\ProgramData\icinga2\var\lib\icinga2\api\log\current get filled with messages, (see this file current.txt ) and is empty after the connection is restored. The log on the Windows 10 Education client also states that messages have been replayed.

debug-node.log

State changes which definitely happen (some services which are checked from this node depend on the network ) do not appear in the history tab of icingaweb.

grafik

Steps to Reproduce (for bugs)

  1. Create a service on a node that depends on the network functionality
    object Service "Network-Status-2" {
    import "generic-service"
    check_command = "ping"
    host_name = "sirius.mittelerde.local"
    vars.ping_address = "192.168.141.2"
    }
  2. Disconnect the node from the network
  3. Connect the node to the network
  4. The state change does not appear in the history of this service

Context

I have a couple of machines were I cannot guarantee network connection, so I thought about using the replay log feature to see states changes at least when the connection comes up again. This does not seem to work.

Your Environment

Node: Disabled features: command compatlog elasticsearch gelf graphite ido-mysql ido-pgsql influxdb livestatus notification opentsdb perfdata statusdata Enabled features: api checker debuglog mainlog

Object 'rigel.mittelerde.local' of type 'Endpoint':
% declared in '/etc/icinga2/zones.conf', lines 6:1-6:40
  * __name = "rigel.mittelerde.local"
  * host = "192.168.141.4"
    % = modified in '/etc/icinga2/zones.conf', lines 7:2-7:23
  * log_duration = 86400
  * name = "rigel.mittelerde.local"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 6
    * last_column = 40
    * last_line = 6
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "rigel.mittelerde.local" ]
    % = modified in '/etc/icinga2/zones.conf', lines 6:1-6:40
  * type = "Endpoint"
  * zone = ""

Object 'sirius.mittelerde.local' of type 'Endpoint':
  % declared in '/etc/icinga2/zones.conf', lines 23:1-23:41
  * __name = "sirius.mittelerde.local"
  * host = ""
  * log_duration = 86400
    % = modified in '/etc/icinga2/zones.conf', lines 24:2-24:18
  * name = "sirius.mittelerde.local"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 23
    * last_column = 41
    * last_line = 23
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "sirius.mittelerde.local" ]
    % = modified in '/etc/icinga2/zones.conf', lines 23:1-23:41
  * type = "Endpoint"
  * zone = ""

Zones:

Object 'director-global' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 18:1-18:29
  * __name = "director-global"
  * endpoints = null
  * global = true
    % = modified in '/etc/icinga2/zones.conf', lines 19:2-19:14
  * name = "director-global"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 18
    * last_column = 29
    * last_line = 18
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "director-global" ]
    % = modified in '/etc/icinga2/zones.conf', lines 18:1-18:29
  * type = "Zone"
  * zone = ""

Object 'master' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 10:1-10:20
  * __name = "master"
  * endpoints = [ "rigel.mittelerde.local" ]
    % = modified in '/etc/icinga2/zones.conf', lines 11:2-11:41
  * global = false
  * name = "master"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 10
    * last_column = 20
    * last_line = 10
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "master" ]
    % = modified in '/etc/icinga2/zones.conf', lines 10:1-10:20
  * type = "Zone"
  * zone = ""

Object 'global-templates' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 14:1-14:30
  * __name = "global-templates"
  * endpoints = null
  * global = true
    % = modified in '/etc/icinga2/zones.conf', lines 15:2-15:14
  * name = "global-templates"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 14
    * last_column = 30
    * last_line = 14
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "global-templates" ]
    % = modified in '/etc/icinga2/zones.conf', lines 14:1-14:30
  * type = "Zone"
  * zone = ""

Object 'sirius.mittelerde.local' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 27:1-27:37
  * __name = "sirius.mittelerde.local"
  * endpoints = [ "sirius.mittelerde.local" ]
    % = modified in '/etc/icinga2/zones.conf', lines 28:2-28:42
  * global = false
  * name = "sirius.mittelerde.local"
  * package = "_etc"
  * parent = "master"
    % = modified in '/etc/icinga2/zones.conf', lines 29:2-29:18
  * source_location
    * first_column = 1
    * first_line = 27
    * last_column = 37
    * last_line = 27
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "sirius.mittelerde.local" ]
    % = modified in '/etc/icinga2/zones.conf', lines 27:1-27:37
  * type = "Zone"
  * zone = ""

Thanks for your support.

Jonatan

dnsmichi commented 5 years ago

What's within the master's debug log for dumping the replayed check results into the IDO database backend?

jonaschl commented 5 years ago

Hi,

thanks for your answer. I am not sure which lines are useful for debugging, so I attached the debug log of the server from 10.00am to 11.00 am. debug-server.log Node and Server are using the same ntp server so time is in snyc.

dnsmichi commented 5 years ago

Hmmm, can you give me a hint where

object Service "Network-Status-2" {
import "generic-service"
check_command = "ping"
host_name = "sirius.mittelerde.local"
vars.ping_address = "192.168.141.2"
}

is located physically on disk? icinga2 object list --type Service --name *Network-Status* is sufficient on the master.

jonaschl commented 5 years ago

Hi,

here is the output of the command you asked for:

Object 'sirius.mittelerde.local!Network-Status-2' of type 'Service':
  % declared in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 77:1-77:33
  * __name = "sirius.mittelerde.local!Network-Status-2"
  * action_url = ""
  * check_command = "ping"
    % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 79:1-79:22
  * check_interval = 60
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 28:3-28:21
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "Network-Status-2"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * host_name = "sirius.mittelerde.local"
    % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 80:1-80:37
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 5
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 27:3-27:24
  * name = "Network-Status-2"
  * notes = ""
  * notes_url = ""
  * package = "_etc"
  * retry_interval = 30
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 29:3-29:22
  * source_location
    * first_column = 1
    * first_line = 77
    * last_column = 33
    * last_line = 77
    * path = "/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf"
  * templates = [ "Network-Status-2", "generic-service" ]
    % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 77:1-77:33
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 26:1-26:34
  * type = "Service"
  * vars
    * ping_address = "192.168.141.2"
      % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 81:1-81:35
  * volatile = false
  * zone = "sirius.mittelerde.local"

Object 'sirius.mittelerde.local!Network-Status' of type 'Service':
  % declared in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 67:1-67:31
  * __name = "sirius.mittelerde.local!Network-Status"
  * action_url = ""
  * check_command = "ping-windows"
    % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 69:1-69:30
  * check_interval = 60
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 28:3-28:21
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "Network-Status"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * host_name = "sirius.mittelerde.local"
    % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 70:1-70:37
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 5
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 27:3-27:24
  * name = "Network-Status"
  * notes = ""
  * notes_url = ""
  * package = "_etc"
  * retry_interval = 30
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 29:3-29:22
  * source_location
    * first_column = 1
    * first_line = 67
    * last_column = 31
    * last_line = 67
    * path = "/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf"
  * templates = [ "Network-Status", "generic-service" ]
    % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 67:1-67:31
    % = modified in '/etc/icinga2/zones.d/global-templates/templates.conf', lines 26:1-26:34
  * type = "Service"
  * vars
    * max_check_attempts = 1
      % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 72:1-72:27
    * ping_win_address = "192.168.141.4"
      % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 71:1-71:39
    * ping_win_crit = [ 1000, 100 ]
      % = modified in '/etc/icinga2/zones.d/sirius.mittelerde.local/services.conf', lines 73:1-73:34
  * volatile = false
  * zone = "sirius.mittelerde.local"
dnsmichi commented 5 years ago

Hm, out of ideas here. Maybe the check results are considered old and are dropped for that very reason on replay logs. Or you are bitten by a replay log bug which will be fixed for 2.11. A good catch would be testing the snapshot packages: https://icinga.com/docs/icinga2/snapshot/doc/21-development/#snapshot-packages-nightly-builds

dnsmichi commented 4 years ago

Either 2.11 fixed this already, or the new IcingaDB backend will do so.