Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Icinga2 IDO <-> API inconsistency #6608

Closed lipazsolt closed 6 years ago

lipazsolt commented 6 years ago

Current Behavior

We acknowledged a WARNING problem via icingaweb2.

[2018-09-10 16:14:17 +0200] information/ExternalCommandListener: Executing external command: [1536588857] ACKNOWLEDGE_SVC_PROBLEM;*********;Network scan - 10.6.0.0/24;0;1;0;username;#TicketID
[2018-09-10 16:14:17 +0200] information/ConfigItem: Committing config item(s).
[2018-09-10 16:14:17 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-09-10 16:14:17 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-09-10 16:14:17 +0200] information/ConfigItem: Activated all objects.
[2018-09-10 16:14:17 +0200] information/Checkable: Checking for configured notifications for object '*********!Network scan - 10.6.0.0/24'

WEB2 GUI rendered correctly: image

But if I send an API request or use icinga2 console cli:

<1> => get_service("*********","Network scan - 10.6.0.0/24")
{
        __name = "*********!Network scan - 10.6.0.0/24"
        acknowledgement = 0.000000
        acknowledgement_expiry = 0.000000
        action_url = ""

You can see the WARNING is not acknowledged.

Web2 Service Inspect is show the same result: image

few MySQL queries:

mysql> select * from icinga_acknowledgements where author_name ="username" and entry_time = "2018-09-10 16:14:17"\G
*************************** 1. row ***************************
  acknowledgement_id: 1069
         instance_id: 1
          entry_time: 2018-09-10 16:14:17
     entry_time_usec: 501162
acknowledgement_type: 1
           object_id: 3972
               state: 1
         author_name: username
        comment_data: #TicketID
           is_sticky: 0
  persistent_comment: 1
     notify_contacts: 1
            end_time: NULL
  endpoint_object_id: 230
1 row in set (0.00 sec)

mysql>
mysql> select * from icinga_servicestatus where output like "%WARNING - 6 new host%" \G
*************************** 1. row ***************************
             servicestatus_id: 2984
                  instance_id: 1
            service_object_id: 3972
           status_update_time: 2018-09-10 12:07:20
                       output: WARNING - 6 new host(s)!
                  long_output: ******
                     perfdata:
                 check_source: *********
                current_state: 1
             has_been_checked: 1
          should_be_scheduled: 1
        current_check_attempt: 1
           max_check_attempts: 4
                   last_check: 2018-09-10 12:07:20
                   next_check: 2018-10-10 09:08:55
                   check_type: 0
            last_state_change: 2018-09-10 11:55:02
       last_hard_state_change: 2018-09-10 12:07:20
              last_hard_state: 1
                 last_time_ok: 2018-08-11 21:58:31
            last_time_warning: 2018-09-10 12:07:20
            last_time_unknown: 2018-04-05 12:50:53
           last_time_critical: NULL
                   state_type: 1
            last_notification: 2018-09-10 12:07:20
            next_notification: 2018-09-10 09:28:52
        no_more_notifications: 0
        notifications_enabled: 1
problem_has_been_acknowledged: 1
         acknowledgement_type: 1
  current_notification_number: 1
       passive_checks_enabled: 0
        active_checks_enabled: 1
        event_handler_enabled: 1
       flap_detection_enabled: 0
                  is_flapping: 0
         percent_state_change: 15.8
                      latency: 0
               execution_time: 167.355588
     scheduled_downtime_depth: 0
   failure_prediction_enabled: 0
     process_performance_data: 1
          obsess_over_service: 0
  modified_service_attributes: 0
          original_attributes: null
                event_handler:
                check_command: **********
        normal_check_interval: 43200
         retry_check_interval: 1
   check_timeperiod_object_id: 0
                 is_reachable: 1
           endpoint_object_id: 230
1 row in set (0.01 sec)

mysql> 

Steps to Reproduce (for bugs)

Sorry i couldn't reproduce.

Context

How could be possible? Can i reset the IDO statuses, and use Icinga2 API/console statuses? Thanks!

Your Environment

dnsmichi commented 6 years ago

Would be interesting if Icinga 2 still thinks that the object's state is 1, or if it already recovered to OK and removed the acknowledgement. There might be a problem with the IDO update then.

Since you're using 2.9.0, I highly recommend to upgrade to 2.9.1 and try again to reproduce this.

lipazsolt commented 6 years ago

Now, I have updated the icinga and web2 to 2.9.1/2.6.1 but still cannot reproduce this. Anyway, I checked the a logs deeper, and i noticed an error few minutes later (I deployed a new config):

[2018-09-10 16:14:17 +0200] information/ExternalCommandListener: Executing external command: [1536588857] ACKNOWLEDGE_SVC_PROBLEM;****;Network scan - 10.6.0.0/24;0;1;0;username;#TicketId
...
[2018-09-10 16:15:16 +0200] information/HttpServerConnection: Request: POST /v1/config/stages/director (from [x.x.x.x]:54334, user: root)
...
[2018-09-10 16:15:20 +0200] information/Application: Got reload command: Starting new instance.
....
[2018-09-10 16:15:27 +0200] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 387, rate: 42.6167/s (2557/min 183998/5min 522918/15min); empty in 9 seconds
...
[2018-09-10 16:15:32 +0200] information/Application: Reload requested, letting new process take over.
...
[2018-09-10 16:15:33 +0200] information/DbConnection: 'ido-mysql' started.
...
[2018-09-10 16:15:33 +0200] information/DbConnection: Resuming IDO connection: ido-mysql
[2018-09-10 16:15:33 +0200] information/IdoMysqlConnection: 'ido-mysql' resumed.
...
[2018-09-10 16:15:38 +0200] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.3')
[2018-09-10 16:15:39 +0200] critical/IdoMysqlConnection: Error "Deadlock found when trying to get lock; try restarting transaction" when executing query "***** 508 SQL command *****

What do you think, is this deadlock issue related to Ack 0/1 problem? I didn't find any "Network Scan - 10.6.0.0/24" string/object/comment/ack SQL command in that 508 commands.

@dnsmichi could please help me to solve this incosistency? Maybe delete a cache folder or something...

dnsmichi commented 6 years ago

Purge all status and config tables from the database, Icinga 2 will populate them on restart again.

lipazsolt commented 6 years ago

Hi @dnsmichi. Thanks you. So I have to run these commands?

$ sudo systemctl stop icinga2
sql> TRUNCATE TABLE icinga_configfiles;
sql> TRUNCATE TABLE icinga_configfilevariables;
sql> TRUNCATE TABLE icinga_contactstatus;
sql> TRUNCATE TABLE icinga_customvariablestatus;
sql> TRUNCATE TABLE icinga_endpointstatus;
sql> TRUNCATE TABLE icinga_hoststatus;
sql> TRUNCATE TABLE icinga_programstatus;
sql> TRUNCATE TABLE icinga_servicestatus;
sql> TRUNCATE TABLE icinga_zonestatus;
sql> TRUNCATE TABLE statusmap;
$ sudo systemctl start icinga2

That's all table?

dnsmichi commented 6 years ago

Create a backup first, and then look into the docs for config/status tables:

https://www.icinga.com/docs/icinga1/latest/en/db_model.html#statustables https://www.icinga.com/docs/icinga1/latest/en/db_model.html#dbm_cf

In terms of further questions on purging the database, please kindly hop onto the forums.