Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Acknowledgement with comment and expiry time does not expire the Acknowledgement after the comment expires #8387

Closed sol1-matt closed 4 years ago

sol1-matt commented 4 years ago

Describe the bug

When an expiry time is added to a service acknowledgement with a comment the comment is shown along with a cancel button (X) and countdown to the expiry time.

When the expiry time passes the comment and countdown disappear, correctly, but the service is still acknowledged and the cancel button still exists.

To Reproduce

  1. Add acknowledgement with comment and expiry time 1 minute in the future to a service in warning or critical state (service colour goes pale and it now has ack comment, count down and cancel X.
  2. Wait until service expires
  3. The service colour is still pale, the comment is missing (correctly), the count down is missing (correctly), the cancel X still remains

Note if you look at the DB at this stage the icinga_servicestatus.acknowledgement_type is set to 1

select * from icinga_objects where name1 = '<name of host>' and name2 = '<name of service>';
select * from icinga_servicestatus where service_object_id = <icinga_objects.object_id>\G;

Expected behavior

When an Acknowledgement with comment and expiry time expires the service should no longer be acknowledged.

Include as many relevant details about the environment you experienced the problem in

# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.1-1)

Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Ubuntu
  Platform version: 18.04.5 LTS (Bionic Beaver)
  Kernel: Linux
  Kernel version: 4.15.0-115-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 8.4.0
  Build host: runner-wytxxqbb-project-298-concurrent-0
  OpenSSL version: OpenSSL 1.1.1  11 Sep 2018

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
# icinga2 feature list
Disabled features: debuglog elasticsearch gelf graphite icingadb icingastatus livestatus opentsdb syslog
Enabled features: api checker command compatlog ido-mysql influxdb mainlog notification perfdata statusdata
# mysql --version
mysql  Ver 15.1 Distrib 10.1.44-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Additional Info

Clicking on the X to remove the Acknowledgement on the service will return the service to the expected state, strong colours and no acknowledgement. The database value icinga_servicestatus.acknowledgement_type will now be set to 0.

Al2Klimov commented 4 years ago

Hello @sol1-matt and thank you for reporting!

Does Icinga Web acknowledge the problem via Icinga 2 API or command pipe?

Best, AK

sol1-matt commented 4 years ago

Looking at the config in /icingaweb2/monitoring/config/edittransport?transport=icinga2 the settings are transport type is local command file command file is /var/run/icinga2/cmd/icinga2.cmd.

The logs have the following

Add Ack

[2020-11-02 15:28:38 +1100] information/ExternalCommandListener: Executing external command: [1604291318] ACKNOWLEDGE_SVC_PROBLEM_EXPIRE;test.host.example;Check Free Memory;0;1;0;1604291369;sol1;test acknowledge 2 [2020-11-02 15:28:38 +1100] information/ConfigObjectUtility: Created and activated object 'test.host.example!Check Free Memory!126c506f-4fc0-4110-bcfd-ea390b3bdf87' of type 'Comment'. [2020-11-02 15:28:38 +1100] information/Checkable: Acknowledgement set for checkable 'test.host.example!Check Free Memory'.

Just after the expiry time passes we get this

[2020-11-02 15:29:50 +1100] information/ConfigObjectUtility: Deleted object 'test.host.example!Check Free Memory!126c506f-4fc0-4110-bcfd-ea390b3bdf87' of type 'Comment'.

The service is still acknowledged at this time, if I cancel it through the icinga2 web UI I see this

[2020-11-02 15:32:11 +1100] information/ExternalCommandListener: Executing external command: [1604291531] REMOVE_SVC_ACKNOWLEDGEMENT;test.host.example;Check Free Memory [2020-11-02 15:32:11 +1100] information/Checkable: Acknowledgement cleared for checkable 'test.host.example!Check Free Memory'.

Al2Klimov commented 4 years ago

Please try using the API command transport.

sol1-matt commented 4 years ago

Configuring the API monitoring transport, in addition to the existing CLI monitoring transport, and changing the order so the API is first in the list of transports worked.

eg:

Transport icinga api (Type: Api) icinga2 (Type: Local)

Both the comment and acknowledgement were removed.

[2020-11-03 10:55:49 +1100] information/Checkable: Acknowledgement cleared for checkable 'test.host.example!Check Disk Space'. [2020-11-03 10:56:12 +1100] information/ConfigObjectUtility: Deleted object 'test.host.example!Check Disk Space!07f68106-0226-4eff-910c-b357b4856b60' of type 'Comment'. Acknowledgement

This is a valid workaround for this problem.

Note: This doesn't clean up services that had previously expired the comment but not the acknowledgement, those still needed to have the old acknowledgement removed and a new acknowledgement with comment and expiry for that service to behave correctly.

Some sql would probably clean that up but I didn't have that many services in a bad state so clean up was done manually to be safe.

Al2Klimov commented 4 years ago

This is a valid workaround for this problem.

... especially as the command pipe has been deprecated.