Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

[dev.icinga.com #12803] Icinga sends notifications for soft state changes #4691

Closed icinga-migration closed 8 years ago

icinga-migration commented 8 years ago

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12803

Created by meebey on 2016-09-27 03:17:45 +00:00

Assignee: (none) Status: Closed (closed on 2016-11-10 19:09:30 +00:00) Target Version: (none) Last Update: 2016-11-11 04:08:06 +00:00 (in Redmine)

Icinga Version: 2.5.4
Backport?: Not yet backported
Include in Changelog: 1

I have a HTTPS check that is switching between CRITICAL and WARNING state (this is not the actual bug, but an issue with the network) and Icinga sends an email on the first check attempt in the new state. So when it goes from WARNING to CRITICAL, and from CRITICAL to WARNING, it sends on the first check the notification.

I didn't know how to obtain the service state log from the filesystem so I made a screenshot, sorry about that.

If you need anything else from me, please tell me.

Service declaration:

object Service "HTTPS" { import "generic-service" host_name = "prod-tool" check_command = "http" check_interval = 60 vars.http_port = 443 vars.http_ssl = 1 }

template Service "generic-service" { max_check_attempts = 5 check_interval = 1m retry_interval = 30s }

Attachments

icinga-migration commented 8 years ago

Updated by mfriedrich on 2016-09-27 11:04:41 +00:00

You can extract the log messages from your icinga2.log or debug.log file (icinga2 feature enable debuglog).

Furthermore please attach the entire configuration objects which are required to easily reproduce the issue (host, service, notification, user objects; extracted from "icinga2 object list").

icinga-migration commented 8 years ago

Updated by meebey on 2016-09-28 03:23:13 +00:00

dnsmichi wrote:

You can extract the log messages from your icinga2.log or debug.log file (icinga2 feature enable debuglog).

I couldn't see any interesting in icinga2.log file except when it sent notifications. I have enabled debuglog now and I will update this ticket when it changes again between WARN and CRIT.

Furthermore please attach the entire configuration objects which are required to easily reproduce the issue (host, service, notification, user objects; extracted from "icinga2 object list").

I can do this, but I will have to redact some stuff from it.

icinga-migration commented 8 years ago

Updated by meebey on 2016-09-30 07:48:16 +00:00

hard WARNING to soft CRITICAL state:

[2016-09-29 19:47:13 +0800] notice/Process: PID 9735 ('/usr/lib/nagios/plugins/check_http' '-I' 'XXX.XXX.XXX.XXX' '-S' '-p' '443') terminated with exit code 2
[2016-09-29 19:47:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] debug/DbEvents: add checkable check history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] debug/DbEvents: add state change history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] notice/Checkable: State Change: Checkable prod-tool!HTTPS hard state change from WARNING to CRITICAL detected.
[2016-09-29 19:47:13 +0800] information/Checkable: Checking for configured notifications for object 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] debug/Checkable: Checkable 'prod-tool!HTTPS' has 1 notification(s).
[2016-09-29 19:47:13 +0800] notice/Notification: Attempting to send  notifications for notification object 'prod-tool!HTTPS!mail-icingaadmin'.
[2016-09-29 19:47:13 +0800] debug/Notification: Type 'Problem', TypeFilter: Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:47:13 +0800] debug/Notification: State 'Critical', StateFilter: Critical, OK, Unknown and Warning (FState=4, StateFilter=15)
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd
, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, State 'Critical', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=4, StateFilter=-1)
[2016-09-29 19:47:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd
, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, State 'Critical', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=4, StateFilter=-1)
[2016-09-29 19:47:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd
, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, State 'Critical', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=4, StateFilter=-1)
[2016-09-29 19:47:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd
, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, State 'Critical', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=4, StateFilter=-1)
[2016-09-29 19:47:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd
, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:47:13 +0800] debug/Notification: User notification, State 'Critical', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=4, StateFilter=-1)
[2016-09-29 19:47:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:47:13 +0800] debug/DbEvents: add notification history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::CheckResult' message
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::SendNotifications' message
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentToAllUsers' message
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = E'0',  active_checks_enabled = E'1',  check_command = E'http
',  check_source = E'monitoring.xxx.com',  check_type = E'0',  current_check_attempt = E'1',  current_notification_number = E'308',  current_state = E'2',  endpoint_object
_id = 462,  event_handler = E'',  event_handler_enabled = E'1',  execution_time = E'10.005214',  flap_detection_enabled = E'0',  has_been_checked = E'1',  instance_id = 1,  is_
flapping = E'0',  is_reachable = E'1',  last_check = TO_TIMESTAMP(1475149633),  last_hard_state = E'2',  last_hard_state_change = TO_TIMESTAMP(1475149633),  last_notification =
 TO_TIMESTAMP(1475075741),  last_state_change = TO_TIMESTAMP(1475149633),  last_time_critical = TO_TIMESTAMP(1475149633),  last_time_warning = TO_TIMESTAMP(1475149565),  latenc
y = E'0.000491',  long_output = E'',  max_check_attempts = E'5',  next_check = TO_TIMESTAMP(1475149691),  next_notification = TO_TIMESTAMP(1475162141),  normal_check_interval =
 E'1',  notifications_enabled = E'1',  original_attributes = E'null',  output = E'CRITICAL - Socket timeout after 10 seconds',  passive_checks_enabled = E'1',  percent_state_ch
ange = E'4',  perfdata = E'',  problem_has_been_acknowledged = E'0',  process_performance_data = E'1',  retry_check_interval = E'0.500000',  scheduled_downtime_depth = E'0',  s
ervice_object_id = 131,  should_be_scheduled = E'1',  state_type = E'1',  status_update_time = TO_TIMESTAMP(1475149633) WHERE service_object_id = 131
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_statehistory (check_source, current_check_attempt, endpoint_object_id, instance_id, last_hard_st
ate, last_state, long_output, max_check_attempts, object_id, output, state, state_change, state_time, state_time_usec, state_type) VALUES (E'monitoring.xxx.com', E'1', 462
, 1, E'2', E'1', E'', E'5', 131, E'CRITICAL - Socket timeout after 10 seconds', E'2', E'1', TO_TIMESTAMP(1475149633), E'394750', E'1')
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: UPDATE icinga_servicestatus SET current_notification_number = E'309',  last_notification = TO_TIMESTAMP(1475149633)
,  next_notification = TO_TIMESTAMP(1475236033) WHERE instance_id = 1 AND service_object_id = 131
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_notifications (contacts_notified, end_time, end_time_usec, endpoint_object_id, escalated, instan
ce_id, long_output, notification_reason, notification_type, object_id, output, start_time, start_time_usec, state) VALUES (E'5', TO_TIMESTAMP(1475149633), E'395159', 462, E'0',
 1, E'', E'0', E'1', 131, E'CRITICAL - Socket timeout after 10 seconds', TO_TIMESTAMP(1475149633), E'395159', E'2')
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: SELECT CURRVAL(pg_get_serial_sequence(E'icinga_notifications', E'notification_id')) AS id
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Sequence Value: 1252
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, 
start_time, start_time_usec) VALUES (234, TO_TIMESTAMP(1475149633), E'395159', 1, 1252, TO_TIMESTAMP(1475149633), E'395159')
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, 
start_time, start_time_usec) VALUES (281, TO_TIMESTAMP(1475149633), E'395159', 1, 1252, TO_TIMESTAMP(1475149633), E'395159')
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, 
start_time, start_time_usec) VALUES (235, TO_TIMESTAMP(1475149633), E'395159', 1, 1252, TO_TIMESTAMP(1475149633), E'395159')
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, 
start_time, start_time_usec) VALUES (236, TO_TIMESTAMP(1475149633), E'395159', 1, 1252, TO_TIMESTAMP(1475149633), E'395159')
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, 
start_time, start_time_usec) VALUES (282, TO_TIMESTAMP(1475149633), E'395159', 1, 1252, TO_TIMESTAMP(1475149633), E'395159')
[2016-09-29 19:47:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9741
[2016-09-29 19:47:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:47:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9740
[2016-09-29 19:47:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:47:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9742
[2016-09-29 19:47:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:47:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9746
[2016-09-29 19:47:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:47:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9750
[2016-09-29 19:47:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:47:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'
.
[2016-09-29 19:47:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: COMMIT
[2016-09-29 19:47:13 +0800] debug/IdoPgsqlConnection: Query: BEGIN
...

soft CRITICAL to soft WARNING state (exactly 1 minute after the WARN to CRIT, so definitely first failed check -> soft state):

[2016-09-29 19:48:13 +0800] notice/Process: PID 9820 ('/usr/lib/nagios/plugins/check_http' '-I' 'XXX.XXX.XXX.XXX' '-S' '-p' '443') terminated with exit code 1
[2016-09-29 19:48:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] debug/DbEvents: add checkable check history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] debug/GraphiteWriter: Add to metric list:'icinga2.prod-tool.services.HTTPS.http.perfdata.time.value 2.270772 1475149693'.
[2016-09-29 19:48:13 +0800] debug/GraphiteWriter: Add to metric list:'icinga2.prod-tool.services.HTTPS.http.perfdata.size.value 1441 1475149693'.
[2016-09-29 19:48:13 +0800] debug/DbEvents: add state change history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] notice/Checkable: State Change: Checkable prod-tool!HTTPS hard state change from CRITICAL to WARNING detected.
[2016-09-29 19:48:13 +0800] information/Checkable: Checking for configured notifications for object 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] debug/Checkable: Checkable 'prod-tool!HTTPS' has 1 notification(s).
[2016-09-29 19:48:13 +0800] notice/Notification: Attempting to send  notifications for notification object 'prod-tool!HTTPS!mail-icingaadmin'.
[2016-09-29 19:48:13 +0800] debug/Notification: Type 'Problem', TypeFilter: Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:48:13 +0800] debug/Notification: State 'Warning', StateFilter: Critical, OK, Unknown and Warning (FState=2, StateFilter=15)
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, State 'Warning', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=2, StateFilter=-1)
[2016-09-29 19:48:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, State 'Warning', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=2, StateFilter=-1)
[2016-09-29 19:48:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, State 'Warning', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=2, StateFilter=-1)
[2016-09-29 19:48:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, State 'Warning', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=2, StateFilter=-1)
[2016-09-29 19:48:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=32, TypeFilter=96)
[2016-09-29 19:48:13 +0800] debug/Notification: User notification, State 'Warning', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=2, StateFilter=-1)
[2016-09-29 19:48:13 +0800] information/Notification: Sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin for user 'xxx'
[2016-09-29 19:48:13 +0800] debug/DbEvents: add notification history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] debug/DbEvents: add contact notification history for service 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::CheckResult' message
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::SendNotifications' message
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentToAllUsers' message
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = E'0',  active_checks_enabled = E'1',  check_command = E'http',  check_source = E'monitoring.xxx.com',  check_type = E'0',  current_check_attempt = E'1',  current_notification_number = E'309',  current_state = E'1',  endpoint_object_id = 462,  event_handler = E'',  event_handler_enabled = E'1',  execution_time = E'2.272506',  flap_detection_enabled = E'0',  has_been_checked = E'1',  instance_id = 1,  is_flapping = E'0',  is_reachable = E'1',  last_check = TO_TIMESTAMP(1475149693),  last_hard_state = E'1',  last_hard_state_change = TO_TIMESTAMP(1475149693),  last_notification = TO_TIMESTAMP(1475149633),  last_state_change = TO_TIMESTAMP(1475149693),  last_time_critical = TO_TIMESTAMP(1475149633),  last_time_warning = TO_TIMESTAMP(1475149693),  latency = E'0.000489',  long_output = E'',  max_check_attempts = E'5',  next_check = TO_TIMESTAMP(1475149751),  next_notification = TO_TIMESTAMP(1475236033),  normal_check_interval = E'1',  notifications_enabled = E'1',  original_attributes = E'null',  output = E'HTTP WARNING: HTTP/1.1 403 Forbidden - 1441 bytes in 2.271 second response time ',  passive_checks_enabled = E'1',  percent_state_change = E'8',  perfdata = E'time=2.270772s;;;0.000000 size=1441B;;;0',  problem_has_been_acknowledged = E'0',  process_performance_data = E'1',  retry_check_interval = E'0.500000',  scheduled_downtime_depth = E'0',  service_object_id = 131,  should_be_scheduled = E'1',  state_type = E'1',  status_update_time = TO_TIMESTAMP(1475149693) WHERE service_object_id = 131
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_statehistory (check_source, current_check_attempt, endpoint_object_id, instance_id, last_hard_state, last_state, long_output, max_check_attempts, object_id, output, state, state_change, state_time, state_time_usec, state_type) VALUES (E'monitoring.xxx.com', E'1', 462, 1, E'1', E'2', E'', E'5', 131, E'HTTP WARNING: HTTP/1.1 403 Forbidden - 1441 bytes in 2.271 second response time ', E'1', E'1', TO_TIMESTAMP(1475149693), E'707753', E'1')
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: UPDATE icinga_servicestatus SET current_notification_number = E'310',  last_notification = TO_TIMESTAMP(1475149693),  next_notification = TO_TIMESTAMP(1475236093) WHERE instance_id = 1 AND service_object_id = 131
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_notifications (contacts_notified, end_time, end_time_usec, endpoint_object_id, escalated, instance_id, long_output, notification_reason, notification_type, object_id, output, start_time, start_time_usec, state) VALUES (E'5', TO_TIMESTAMP(1475149693), E'708189', 462, E'0', 1, E'', E'0', E'1', 131, E'HTTP WARNING: HTTP/1.1 403 Forbidden - 1441 bytes in 2.271 second response time ', TO_TIMESTAMP(1475149693), E'708189', E'1')
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: SELECT CURRVAL(pg_get_serial_sequence(E'icinga_notifications', E'notification_id')) AS id
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Sequence Value: 1253
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, start_time, start_time_usec) VALUES (234, TO_TIMESTAMP(1475149693), E'708189', 1, 1253, TO_TIMESTAMP(1475149693), E'708189')
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, start_time, start_time_usec) VALUES (281, TO_TIMESTAMP(1475149693), E'708189', 1, 1253, TO_TIMESTAMP(1475149693), E'708189')
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, start_time, start_time_usec) VALUES (235, TO_TIMESTAMP(1475149693), E'708189', 1, 1253, TO_TIMESTAMP(1475149693), E'708189')
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, start_time, start_time_usec) VALUES (236, TO_TIMESTAMP(1475149693), E'708189', 1, 1253, TO_TIMESTAMP(1475149693), E'708189')
[2016-09-29 19:48:13 +0800] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_contactnotifications (contact_object_id, end_time, end_time_usec, instance_id, notification_id, start_time, start_time_usec) VALUES (282, TO_TIMESTAMP(1475149693), E'708189', 1, 1253, TO_TIMESTAMP(1475149693), E'708189')
[2016-09-29 19:48:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9822
[2016-09-29 19:48:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:48:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9825
[2016-09-29 19:48:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:48:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9824
[2016-09-29 19:48:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:48:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9823
[2016-09-29 19:48:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
[2016-09-29 19:48:13 +0800] notice/Process: Running command '/etc/icinga2/scripts/mail-service-notification.sh': PID 9827
[2016-09-29 19:48:13 +0800] debug/DbEvents: add log entry history for 'prod-tool!HTTPS'
[2016-09-29 19:48:13 +0800] information/Notification: Completed sending 'Problem' notification 'prod-tool!HTTPS!mail-icingaadmin' for checkable 'prod-tool!HTTPS' and user 'xxx'.
[2016-09-29 19:48:13 +0800] notice/ApiListener: Relaying 'event::NotificationSentUser' message
...
icinga-migration commented 8 years ago

Updated by mfriedrich on 2016-09-30 13:28:50 +00:00

The first entry is definitely a hard state change, not a soft critical.

[2016-09-29 19:47:13 +0800] notice/Checkable: State Change: Checkable prod-tool!HTTPS hard state change from WARNING to CRITICAL detected.

I would assume that the retry_interval triggers another check after 30s which still is CRITICAL (so no state change detected and therefore no notification sent out).

30 seconds later the check changes from HARD critical to HARD warning, which triggers a hard state change and therefore notifications,

[2016-09-29 19:48:13 +0800] notice/Checkable: State Change: Checkable prod-tool!HTTPS hard state change from CRITICAL to WARNING detected.

This works as expected.

Is there anything specific which makes you think that there is a SOFT state involved somewhere?

The Icinga Web 2 screenshot isn't really helpful as it removes the notified users entirely (which could be the cause for multiple notification warnings).

icinga-migration commented 8 years ago

Updated by meebey on 2016-10-04 04:44:13 +00:00

dnsmichi wrote:

This works as expected.

Is there anything specific which makes you think that there is a SOFT state involved somewhere?

I made 2 assumptions when I filed this bug report, please correct me if I am wrong:

a) when a check is in a hard CRITICAL state and changes to another state, like WARNING or UNKNOWN, the state becomes a soft state, as it only has 1 probe of the new state (unconfirmed)

b) my configuration says to re-check after 30 seconds if the check failed (retry_interval = 30s), but the number of probes is set to 5 before it enters the hard state (max_check_attempts = 5). So the earliest it can enter a hard state should be after 2.5 minutes (5 check attempts * 30 retry seconds), and this is where the notification after 1 minute doesn't make sense to me. But if my a) assumption is incorrect then this is too :)

icinga-migration commented 8 years ago

Updated by mfriedrich on 2016-11-10 19:09:30 +00:00

ad a)

No, once the state changes to a HARD NOT-OK, each incoming NOT-OK state will not change that to SOFT again. SOFT states are only used when transitioning from OK -> NOT-OK, then counting till max check attempts are reached. Then a HARD state occurs and even stays there once the problem recovers (HARD OK).

ad b)

Being in a hard state, there is no retry interval anymore (that's only used in that SOFT state period). The normal check interval applies. Icinga already knows about the problem being there, but if no-one fixes it, there is no sense in firing checks that often.

So Icinga 2 works as expected, and you've hopefully learned some bits. I'm closing this one.

icinga-migration commented 8 years ago

Updated by meebey on 2016-11-11 04:08:06 +00:00

Thank you for the clarification, Icinga2 works as intended then.