Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2k stars 574 forks source link

Call ConfigObject#Register() after #OnAllConfigLoaded() for runtime creation #10057

Open Al2Klimov opened 4 months ago

Al2Klimov commented 4 months ago

so that they're invisible until successful complete initialization. Otherwise some members may be still NULL and crash the daemon.

ref/IP/53428

What Icinga does with config

  1. Parse AST
  2. Eval AST, which produces ConfigItems
  3. Commit and register, for each item/object
  4. Call ConfigObject#OnAllConfigLoaded to init remaining members, e.g. Downtime#m_Checkable

Registering an object makes it immediately available for all object getters by type, as well as DSL get_host(), etc.. This is fine for the config load, "otherwise such a stupid config won't function properly." © @yhabteab

But not for runtime objects! ConfigObject#Start() has already been called for all and everything does something. E.g. IcingaDB#UpdateAllConfigObjects() operates on all objects as soon as registered – ready or not:

(gdb) frame 8
#8  0x0000000000f9ecac in icinga::IcingaDB::PrepareObject (object=...,
    attributes=..., checksums=...)
    at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:179
179     /usr/include/boost/smart_ptr/intrusive_ptr.hpp: No such file or directory.
(gdb) p downtime.px->m_Checkable
$6 = {px = 0x0}
(gdb)

closes #10151

Al2Klimov commented 4 months ago

CC @Wintermute2k6

Al2Klimov commented 4 months ago

Btw. the only runtime child objects origin from apply rules and for them such "late" registration is still soon enough:

apply Service "get_host" {
check_command = "dummy"
log(get_host(host.name))
assign where true
}
curl -sSiku root:123456 -X PUT -H 'Accept: application/json' 'https://127.0.0.1:5665/v1/objects/hosts/h' -d '{"pretty":1,"attrs":{"check_command":"dummy"}}'
[2024-05-16 10:57:20 +0200] information/ApiListener: New client connection from [::ffff:127.0.0.1]:52732 (no client certificate)
[2024-05-16 10:57:20 +0200] information/config: Object of type 'Host'
[2024-05-16 10:57:20 +0200] information/ConfigObjectUtility: Created and activated object 'h' of type 'Host'.
[2024-05-16 10:57:20 +0200] information/HttpServerConnection: Request PUT /v1/objects/hosts/h (from [::ffff:127.0.0.1]:52732, user: root, agent: curl/8.6.0, status: OK) took 12ms.

👍

julianbrost commented 1 month ago

So this change only has an effect for objects created via ConfigObjectUtility::CreateObject(), i.e. runtime-created objects. I'm not sure this would actually fix the problem reported in ref/IP/53428, where you concluded that the null dereference happend while starting the Icinga DB feature, so doesn't that suggest that it happened when starting Icinga 2, not due to a runtime object creation?

Al2Klimov commented 1 month ago

Immediately after Icinga 2 start, the Icinga DB feature dumps all objects, once connected to Redis. This can take long enough for new runtime objects to appear. If now PrepareObject() (see OP) from UpdateAllConfigObjects() happens just before Downtime#OnAllConfigLoaded(), Icinga crashes as in ref/IP/53428.