Closed redbob365 closed 5 years ago
Please generate a full backtrace of the running process described here: https://icinga.com/docs/icinga2/latest/doc/21-development/#gdb-backtrace-from-running-process
Please attach them as a zip file next time, Dropbox with all the popups is barely usable.
I can see that there are a lot of threads executing checks, and their processes run like forever (up until the timeout kills them).
Thread 55 (Thread 0x7fec9f9e7700 (LWP 82963)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
No locals.
#1 0x00007fed0608edbd in __GI___pthread_mutex_lock (mutex=0xea3300) at ../nptl/pthread_mutex_lock.c:80
__PRETTY_FUNCTION__ = "__pthread_mutex_lock"
type = 0
id = <optimized out>
#2 0x0000000000602e18 in ?? ()
No symbol table info available.
#3 0x00000000005fcbbd in icinga::Process::Run(std::function<void (icinga::ProcessResult const&)> const&) ()
No symbol table info available.
#4 0x00000000006cfb43 in icinga::PluginUtility::ExecuteCommand(boost::intrusive_ptr<icinga::Command> const&, boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, std::vector<std::pair<icinga::String, boost::intrusive_ptr<icinga::Object> >, std::allocator<std::pair<icinga::String, boost::intrusive_ptr<icinga::Object> > > > const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool, int, std::function<void (icinga::Value const&, icinga::ProcessResult const&)> const&) ()
No symbol table info available.
#5 0x00000000007d69f6 in icinga::PluginCheckTask::ScriptFunc(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool) ()
No symbol table info available.
#6 0x00000000007d5a03 in std::_Function_handler<icinga::Value (std::vector<icinga::Value, std::allocator<icinga::Value> > const&), std::enable_if<std::is_function<std::remove_pointer<void (*)(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool)>::type>::value&&(!std::is_same<void (*)(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool), icinga::Value (*)(std::vector<icinga::Value, std::allocator<icinga::Value> > const&)>::value), std::function<icinga::Value (std::vector<icinga::Value, std::allocator<icinga::Value> > const&)> >::type icinga::WrapFunction<void (*)(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool)>(void (*)(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool))::{lambda(std::vector<icinga::Value, std::allocator<icinga::Value> > const&)#1}>::_M_invoke(std::_Any_data const&, std::vector<icinga::Value, std::allocator<icinga::Value> > const&) ()
No symbol table info available.
#7 0x00000000005d81cf in icinga::Function::Invoke(std::vector<icinga::Value, std::allocator<icinga::Value> > const&) ()
No symbol table info available.
#8 0x00000000006e8dbd in icinga::CheckCommand::Execute(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool) ()
No symbol table info available.
#9 0x00000000006fd5e5 in icinga::Checkable::ExecuteCheck() ()
No symbol table info available.
#10 0x00000000007cb55b in icinga::CheckerComponent::ExecuteCheckHelper(boost::intrusive_ptr<icinga::Checkable> const&) ()
No symbol table info available.
#11 0x000000000062f2f6 in icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) ()
No symbol table info available.
#12 0x00007fed06a3d5d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
No symbol table info available.
#13 0x00007fed0608c6ba in start_thread (arg=0x7fec9f9e7700) at pthread_create.c:333
__res = <optimized out>
pd = 0x7fec9f9e7700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140654266971904, -3246129604114312920, 0, 140656005215279, 262144, 140654676105616, 3254503214567067944, 3254240247518138664}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#14 0x00007fed06f5d41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
Thread 21 til 61 are hanging check execution threads. Investigate on why these checks are hanging that long.
@dnsmichi, I found several services without defined timeout (blank values), maybe they were causing endless running processes. After that, CPU usage dropped to half load. I tried by all manners to upload a file in github, but I have problems! Look at these: gdb_bt_1670_1553874512.log and gdb_bt_1730_1553874504. These are graphics before and after the fixes. I must say that I disabled grafana and graphite. I'll try to re-enable them to see if it would impact over it.
File upload to GitHub requires a certain format, e.g. a zip file containing all the logs. Since there was no feedback anymore, I'll consider this being resolved with the changed plugin timeout.
Hi,
My icinga2 Master Endpoint is running into a XCP-ng Host. It's suitably running, except by heavy load of CPU Usage. It's impacting all Host system. We don't have any satellite attached to it.
Look at these graphs from XCP server:
That's
top
from the Master server:Is there any solution to it?
My Environment
Version used (
icinga2 --version
): r2.10.3-1Operating System and version: Ubuntu 16.04.6
Enabled features (
icinga2 feature list
): api checker command compatlog graphite ido-mysql influxdb livestatus mainlog notification perfdata statusdataIcinga Web 2 version: 2.6.2
icinga2 daemon -C
):[2019-03-20 13:28:54 -0400] information/cli: Icinga application loader (version: r2.10.3-1) [2019-03-20 13:28:54 -0400] information/cli: Loading configuration file(s). [2019-03-20 13:28:55 -0400] information/ConfigItem: Committing config item(s). [2019-03-20 13:28:55 -0400] information/ApiListener: My API identity: srvici-mt.mt.trf1.gov.br [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 11:1-11:45) for type 'Notification' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 23:1-23:48) for type 'Notification' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'backup-downtime' (in /etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'apt' (in /etc/icinga2/conf.d/apt.conf: 1:0-1:18) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'ping6' (in /etc/icinga2/conf.d/services.conf: 35:1-35:21) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'ssh' (in /etc/icinga2/conf.d/services.conf: 48:1-48:19) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule '' (in /etc/icinga2/conf.d/services.conf: 58:1-58:65) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'disk:' (in /etc/icinga2/conf.d/services.conf: 66:1-66:80) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'icinga' (in /etc/icinga2/conf.d/services.conf: 75:1-75:22) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'load' (in /etc/icinga2/conf.d/services.conf: 81:1-81:20) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'procs' (in /etc/icinga2/conf.d/services.conf: 92:1-92:21) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'swap' (in /etc/icinga2/conf.d/services.conf: 100:1-100:20) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'users' (in /etc/icinga2/conf.d/services.conf: 108:1-108:21) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'nrpe_ss' (in /etc/icinga2/conf.d/services.conf: 116:1-116:23) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] warning/ApplyRule: Apply rule 'disk' (in /etc/icinga2/conf.d/services.conf: 125:1-125:20) for type 'Service' does not match anywhere! [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 100 Services. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 InfluxdbWriter. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 LivestatusListener. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 IcingaApplication. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 178 Hosts. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 FileLogger. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 2 NotificationCommands. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 NotificationComponent. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 13 HostGroups. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 ApiListener. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 GraphiteWriter. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 PerfdataWriter. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 CheckerComponent. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 3 Zones. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 StatusDataWriter. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 ExternalCommandListener. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 Endpoint. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 4 ApiUsers. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 CompatLogger. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 2 Users. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 IdoMysqlConnection. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 222 CheckCommands. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 UserGroup. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 1 ServiceGroup. [2019-03-20 13:28:58 -0400] information/ConfigItem: Instantiated 3 TimePeriods. [2019-03-20 13:28:58 -0400] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2019-03-20 13:28:58 -0400] information/cli: Finished validating the configuration file(s).