Open ymartin-ovh opened 1 year ago
Hello Martin!
How did you create those hosts and groups in the first place?
Best, A/K
Hello
Hosts and groups were created through API calls (masters).
Please share their config.
From a satellite, I have something like that.
icinga2 tried to create host before hostgroup and was unhappy to activate host object because hostgroup was missing. As a workaround, I chattr+i the host folder to ensure that icinga2 sync packages and hostgroups first.
/var/lib/icinga2/api/packages/_api/164466ca-64c1-4e63-b4e0-eaaa65d9c493/conf.d/hosts/foobar.conf:
object Host "foobar" {
import "webservers-host"
address = "10.19.65.220"
groups = [ "www-hosts" ]
vars["delivery_status"] = "delivered"
version = 1673339012.627591
zone = "labeu"
}
/var/lib/icinga2/api/packages/_api/164466ca-64c1-4e63-b4e0-eaaa65d9c493/conf.d/hostgroups/www-hosts.conf
object HostGroup "www-hosts" {
version = 1664280940.247957
zone = "global-templates"
}
Why did you put them in different zones?
I want to have a group with all www-hosts that regroups all regions.
I have this on icinga2 config too (masters & satellites) /etc/icinga2/zones.conf:
object Zone "global-templates" {
global = true
}
host and groups are synced to satellite labeu region.
Have you tried https://icinga.com/docs/icinga-2/latest/doc/17-language-reference/#group-assign instead?
I will try this.
However, I think that host activation should rely on hostgroup activation. I will create host.vars.groups and add an assign relashionship on all groups. I have the impression I re-implement what Icinga2 is doing with host.groups list.
Regards
Hello @Al2Klimov
I can't find a way to create group object with assign rule with API. Do you know how to do this ?
Regards
I'm afraid that's impossible via API.
Hum,
I have my satellite fresh start issue with 2.14.0 too.
On the first run, I can see the following error in logs:
icinga2[3965374]: icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Host]: Assertion `px != 0' failed.
icinga2[3965374]: Caught SIGABRT
Build information:
Compiler: GNU 10.2.1
Build host: runner-hh8q3bz2-project-575-concurrent-0
OpenSSL version: OpenSSL 1.1.1n 15 Mar 2022
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
Stacktrace:
0# icinga::Application::SigAbrtHandler(int) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
1# 0x00007F6570AED140 in /lib/x86_64-linux-gnu/libpthread.so.0
2# gsignal in /lib/x86_64-linux-gnu/libc.so.6
3# abort in /lib/x86_64-linux-gnu/libc.so.6
4# 0x00007F65705FD40F in /lib/x86_64-linux-gnu/libc.so.6
5# 0x00007F657060C662 in /lib/x86_64-linux-gnu/libc.so.6
6# 0x000055DDCFA96793 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
7# icinga::Comment::OnAllConfigLoaded() in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
8# 0x000055DDCF7E1645 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
9# icinga::WorkQueue::RunTaskFunction(std::function<void ()> const&) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
10# 0x000055DDCF7F462C in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
11# icinga::WorkQueue::RunTaskFunction(std::function<void ()> const&) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
12# icinga::WorkQueue::WorkerThreadProc() in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
13# 0x00007F657110B787 in /lib/x86_64-linux-gnu/libboost_thread.so.1.74.0
14# 0x00007F6570AE1EA7 in /lib/x86_64-linux-gnu/libpthread.so.0
15# clone in /lib/x86_64-linux-gnu/libc.so.6
@julianbrost Please say you can decode those addresses via your recent gdb magic. ðŸ˜
I'm bissecting, I have the crash with this stacktrace since 2.13.6.
crash report with 2.13.6: report.1693489668.561474-2.13.6.txt
crash report with 2.14.0: report.1693490091.825202-2.14.0.txt
For now, to not trigger the bug, for the first start of a satellite:
@julianbrost Please say you can decode those addresses via your recent gdb magic. ðŸ˜
No magic involved there, just install the the package with the debug symbols for that very exact version.
Hello
Checking with dbg symbols, abort I see is related to config validation failure because of missing groups (the topic of my bug report):
Stacktrace:
0# icinga::Application::SigAbrtHandler(int) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
1# 0x00007F8ADAB7B140 in /lib/x86_64-linux-gnu/libpthread.so.0
2# gsignal in /lib/x86_64-linux-gnu/libc.so.6
3# abort in /lib/x86_64-linux-gnu/libc.so.6
4# 0x00007F8ADA68B40F in /lib/x86_64-linux-gnu/libc.so.6
5# 0x00007F8ADA69A662 in /lib/x86_64-linux-gnu/libc.so.6
6# 0x0000561357608793 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
7# icinga::Comment::OnAllConfigLoaded() in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
8# 0x0000561357353645 in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
9# icinga::WorkQueue::RunTaskFunction(std::function<void ()> const&) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
10# 0x000056135736662C in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
11# icinga::WorkQueue::RunTaskFunction(std::function<void ()> const&) in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
12# icinga::WorkQueue::WorkerThreadProc() in /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
13# 0x00007F8ADB199787 in /lib/x86_64-linux-gnu/libboost_thread.so.1.74.0
14# 0x00007F8ADAB6FEA7 in /lib/x86_64-linux-gnu/libpthread.so.0
15# clone in /lib/x86_64-linux-gnu/libc.so.6
Starting with the satellite with an empty configuration directory, the daemon fails to start up with a valid full configuration ; /var/lib/icinga2/api/packages/_api/
Icinga2 master send config object in that order:
I'm trying to understand why master does not send hostgroups configuration first.
#8 0x0000561357608793 in boost::intrusive_ptr<icinga::Host>::operator->() const [clone .part.0] [clone .lto_priv.0] (this=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:199
__PRETTY_FUNCTION__ = {<optimized out> <repeats 71 times>}
#9 0x0000561357521958 in boost::intrusive_ptr<icinga::Host>::operator-> (this=<synthetic pointer>) at ../lib/icinga/./lib/icinga/comment.cpp:75
__PRETTY_FUNCTION__ = {<optimized out> <repeats 71 times>}
#10 icinga::Comment::OnAllConfigLoaded (this=0x7f8ac1c35000) at ../lib/icinga/./lib/icinga/comment.cpp:71
host = {px = <optimized out>}
=> m_Checkable = host->GetServiceByShortName(GetServiceName()); Do we need to check if host is null ?
In void Comment::OnAllConfigLoaded(), doing this:
- if (GetServiceName().IsEmpty())
+ if (GetServiceName().IsEmpty() || ! host)
fix my issue
Hah! This shall be fixed by:
(I told its absence will make problems.)
You've already tested a custom patch to Icinga. Please could you also test that PR's commit cherry-picked on top of the support/2.14 or support/2.13 branch, whichever will build? (I guess only support/2.13 due to #9577.) If you need a 2.14 (or can't reproduce with 2.13 anymore) I guess(!) you could also revert #9577 and then cherry-pick.
I cherry pick #7786 against v2.14, need to adapt a thing:
--- a/lib/remote/apilistener-configsync.cpp
+++ b/lib/remote/apilistener-configsync.cpp
@@ -459,8 +459,7 @@ void ApiListener::SendRuntimeConfigObjects(const JsonRpcConnection::Ptr& aclient
bool unresolved_dep = false;
/* skip this type (for now) if there are unresolved load dependencies */
- for (const String& loadDep : type->GetLoadDependencies()) {
- Type::Ptr pLoadDep = Type::GetByName(loadDep);
+ for (auto pLoadDep : type->GetLoadDependencies()) {
The patch seems to help no triggering the null reference use (for comments) but Icinga2 is still trying to load host before groups. Daemon dies with exit code 139. I can't recover / achieve a state where the satellite grabs all configuration (hosts & hostgroups).
Satellite api package content (hostgroup missing)
ls -l /var/lib/icinga2/api/packages/_api/6c96ea8d-b3a2-4666-8c20-5f60d584460d/conf.d/
total 124
drwx------ 2 nagios nagios 69632 Sep 12 10:11 comments
drwx------ 2 nagios nagios 45056 Sep 12 10:11 downtimes
drwx------ 2 nagios nagios 4096 Sep 12 10:11 hosts
Daemon dies with exit code 139.
Despite the PR?
@Al2Klimov I only apply #7786 with the diff about type->GetLoadDependencies because of #9577 change. I don't pick my yesterday diff #9861
The patch seems to help no triggering the null reference use (for comments)
At least one thing it fixes, OK.
I cherry pick #7786 against v2.14, need to adapt a thing:
--- a/lib/remote/apilistener-configsync.cpp +++ b/lib/remote/apilistener-configsync.cpp @@ -459,8 +459,7 @@ void ApiListener::SendRuntimeConfigObjects(const JsonRpcConnection::Ptr& aclient bool unresolved_dep = false; /* skip this type (for now) if there are unresolved load dependencies */ - for (const String& loadDep : type->GetLoadDependencies()) { - Type::Ptr pLoadDep = Type::GetByName(loadDep); + for (auto pLoadDep : type->GetLoadDependencies()) {
Please could you open a new PR into that PR, i.e. bugfix/api-runtime-object-sync-order is your base branch and your adaption is the diff? (Mention me in this case.)
Ok, I will do this.
Hum, about loaddependencies. How can I express the fact that HostGroup should be a dependency of Host (aka load HostGroup objects before Host) ?
Regards
Not sure that you wanna actually do this, but see https://github.com/Icinga/icinga2/pull/8119/files#diff-7529d2f2812859b880d8d6cdf34b2ad783226b8bc59a79077a602f76c6fde0f0 .
https://github.com/Icinga/icinga2/pull/8119/files#diff-7529d2f2812859b880d8d6cdf34b2ad783226b8bc59a79077a602f76c6fde0f0 => I don't understand the relationship of "load_after Host".
Host object has group name reference not the opposite.
This was only an example, just make the opposite if you wanna test it. But the directive is always load_after.
So I was wrong #7786 does not fix the issue with comment objects. The exit code 139 was the segfault I addressed yesterday by checking host reference #9861.
I can make a PR to refresh #7786 so we can applied against v2.14.0 but I don't know how to check if it's OK or not. For now, I didn't find any improvment with this. Maybe, the diff will have more sense if I add load_after HostGroup;
in host.ti.
Yes, test the latter if you believe it will help.
I update #9861.
For now, all my tests with load_after does not seem to change anything. When Icinga starts, the satellite receives and tries to load objects ... / comments / ... / hosts / ... / hostgroups.
Maybe I miss something.
Do all three together fix your problem?
load_after HostGroup;
in host.ti
Hello
When starting a new icinga2 instance inside a zone, icinga2 fails to start because config validation fails: group in hosts are missing.
Looking on satellite filesystem (/var/lib/icinga2/api/packages/_api/bdd5cdff-6e46-4795-a2cf-64ef56d3b397/conf.d):
I expect that icinga satellites load package" and hostgroups before hosts.
To fix this, I do:
When hostgroups config is sync:
I didn't experience this before icinga2 2.13.7-1+debian11
Regards