Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
1.99k stars 573 forks source link

icinga2 randomly does not reload objects after adding new objects via api #6957

Closed raffis closed 4 years ago

raffis commented 5 years ago

Creating new objects does not reflect on the web interface (And i'm pretty sure they do not get checked as well).

The objects are visible after manually restarting icinga.

The problem look quite random (See steps to reproduce):

Again only restarting icinga solves this problem.

Expected Behaviour

New objects (10 test servicegroups (Can be any object types)) are visible in icingaweb.

Current Behaviour

Servicegroups are not visible in the servicegroup list in the icinga web ui. As far as I can see the web ui fetches its information not from the api but from the mysql db directly. GET https://localhost:5665/v1/objects/servicegroups lists all those objects also does icinga2 object list.

As soon as I restart icinga the objects are visible in the web ui.

Possible Solution

Not sure how this can happen but it looks like a major problem.

Steps to Reproduce (for bugs)

Create file /tmp/test:

curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test10' -d '{ "attrs": { "display_name":"test10", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test11' -d '{ "attrs": { "display_name":"test11", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test12' -d '{ "attrs": { "display_name":"test12", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test13' -d '{ "attrs": { "display_name":"test13", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test14' -d '{ "attrs": { "display_name":"test14", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test15' -d '{ "attrs": { "display_name":"test15", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test16' -d '{ "attrs": { "display_name":"test16", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test17' -d '{ "attrs": { "display_name":"test17", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test18' -d '{ "attrs": { "display_name":"test18", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test19' -d '{ "attrs": { "display_name":"test19", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test20' -d '{ "attrs": { "display_name":"test20", "groups": [] }}'

cat /tmp/test | while read l; do sh -c "$l"; done

Objects are not visible in the web ui. (You may need to do this a couple of time since this is not always the case but mostly)

Chances are higher to get some objects if waiting a short time between requests: cat /tmp/test | while read l; do sh -c "$l"; sleep 2; done

Context

I have this issue in kube-icinga https://github.com/gyselroth/kube-icinga. This async app does create many api calls within a short time and even async.

Your Environment

dnsmichi commented 5 years ago

Please see #6012.

raffis commented 5 years ago

Please see #6012.

Nice. fast response! Looks like #5205/#6927 (or the mentioned parent task).

So basically I need to trigger lots of restarts since kube-icinga may add lots of objects (also removing them first since there is no way to trigger apply rules for changed objects...)

Only workaround is to trigger a restart?

dnsmichi commented 5 years ago

Up until the underlaying problem inside the IDO feature is fixed, a restart is the only workaround, yes.

raffis commented 5 years ago

Up until the underlaying problem inside the IDO feature is fixed, a restart is the only workaround, yes.

Whats the difference between a service reload via init and a POST /v1/actions/restart-process ?

After sending a POST /v1/actions/restart-process my object list in icingaweb is empty and adding the same object again ends in error 500:

"Cannot create object 'test10'. Configuration file '/var/lib/icinga2/api/packages/_api//conf.d/servicegroups/test10.conf' already exists."

(Which is a different error compared to just do restart via systemd)

Sending a POST /v1/actions/restart-process would be the only workaround for my app. Otherwise this is gonna be impossible with the actual version of the icinga api.

[2019-02-19 15:59:11 +0000] information/HttpServerConnection: Request: POST /v1/actions/restart-process (from [172.19.0.1]:50396), user: icinga2-director)
[2019-02-19 15:59:11 +0000] information/HttpServerConnection: HTTP client disconnected (from [172.19.0.1]:50396)
[2019-02-19 15:59:12 +0000] information/Application: Got reload command: Starting new instance.
[2019-02-19 15:59:12 +0000] information/Application: Reload requested, letting new process take over.
[2019-02-19 15:59:12 +0000] information/ApiListener: 'api' stopped.
[2019-02-19 15:59:12 +0000] information/CheckerComponent: 'checker' stopped.
[2019-02-19 15:59:12 +0000] information/CompatLogger: 'compatlog' stopped.
[2019-02-19 15:59:12 +0000] information/ExternalCommandListener: 'command' stopped.
[2019-02-19 15:59:13 +0000] information/FileLogger: 'main-log' started.
[2019-02-19 15:59:13 +0000] information/ApiListener: 'api' started.
[2019-02-19 15:59:13 +0000] information/ApiListener: Copying 2 zone configuration files for zone 'director-global' to '/var/lib/icinga2/api/zones/director-global'.
[2019-02-19 15:59:13 +0000] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/director-global' (0 Bytes). Received timestamp '2019-02-19 15:59:13 +0000' (1550591953.329891), Current timestamp '2019-02-19 15:52:47 +0000' (1550591567.355288).
[2019-02-19 15:59:13 +0000] information/ApiListener: Copying 1 zone configuration files for zone 'master' to '/var/lib/icinga2/api/zones/master'.
[2019-02-19 15:59:13 +0000] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/master' (0 Bytes). Received timestamp '2019-02-19 15:59:13 +0000' (1550591953.330238), Current timestamp '2019-02-19 15:52:47 +0000' (1550591567.355010).
[2019-02-19 15:59:13 +0000] information/ApiListener: Started new listener on '[0.0.0.0]:5665'
[2019-02-19 15:59:13 +0000] information/ExternalCommandListener: 'command' started.
[2019-02-19 15:59:13 +0000] information/GraphiteWriter: 'graphite' started.
[2019-02-19 15:59:13 +0000] information/LivestatusListener: 'livestatus' started.
[2019-02-19 15:59:13 +0000] information/LivestatusListener: Created UNIX socket in '/run/icinga2/cmd/livestatus'.
[2019-02-19 15:59:13 +0000] information/CheckerComponent: 'checker' started.
[2019-02-19 15:59:13 +0000] information/NotificationComponent: 'notification' started.
[2019-02-19 15:59:13 +0000] information/DbConnection: 'ido-mysql' started.
[2019-02-19 15:59:13 +0000] information/CompatLogger: 'compatlog' started.
dnsmichi commented 5 years ago

"Cannot create object 'test10'. Configuration file '/var/lib/icinga2/api/packages/_api//conf.d/servicegroups/test10.conf' already exists."

It misses the stage name after _api/, so highly likely the API package got broken somehow in the process of restarting.

raffis commented 5 years ago

"Cannot create object 'test10'. Configuration file '/var/lib/icinga2/api/packages/_api//conf.d/servicegroups/test10.conf' already exists."

It misses the stage name after _api/, so highly likely the API package got broken somehow in the process of restarting.

Argh my fault, I have removed content in /var/lib/icinga2/api/packages/_api manually during debuging and just noticed that files like active-stage.conf, active.conf were missing after restart. But if I create new objects via the api those get created in conf.d folder directly in _api, /var/lib/icinga2/api/packages/_api/conf.d/xxxx. And after restart the service the added services are gone again (But files still there).

Maybe a check for that would be helpful (or a log entry somewhere that the stage folder is gone or not active.) Probably the api should respond with a 500 error and not accepting new objects in the first place.

dnsmichi commented 5 years ago

I've created #6959 as follow-up. I just don't have the time to code any further here, maybe you'd like to catch up on this.

raffis commented 5 years ago

I've created #6959 as follow-up. I just don't have the time to code any further here, maybe you'd like to catch up on this.

:+1:, yes as soon as I have some spare time.

dnsmichi commented 4 years ago

Will be superseded with IcingaDB, the old tracking for the IDO is #6012.