Closed Stefar77 closed 6 years ago
I have the exact same problem when running the dashing dashboard for icinga2. After latest updates it works fine. but icinga crashed in the middle of the night so i had a several hour monitoring outage.
Same as in your case, no interesting logs.
This might be an easy to reproduce case.
I never had this problem when creating or querying objects.
icingaweb2 can use either external commands or the API to run 'check now', can you verify you are using the API as command transport?
Crunsher, the 'check now' is only a method to make it go down fast, it's the API call in the plugin that is killing my Icinga. My plugin updates Mitel Controller alerts (~10 passive) services and I had a bug that ignored memcached and always pushed the status. (even when there was no change) when I change the poller to use API calls instead of sockets the API will crash. 'Check Now' on 40 Mitel Controllers tries to do ~400 passive updates at about the same time to the API and seems to kill it.
I fixed my poller and switched back to sockets to prevent the API from overloading until it's fixed. :-)
Ps. Normally Icinga has 46 threads and goes up for a sec when I refresh in aNag or some host is pushing some event but it usually backs down to 46 again, when I stress the API it will stay at 128 threads and the API will hang. This then also generates lots of timeouts in the Mitel pollers because requests API seem to hang forever, eventually I think the Icinga process disappears without any notice in crash/ or log.
For now; $use_api=false; // Set to true on a busy poller to kill the Icinga API
We are still migrating from Nagios to Icinga2 and it's not in production yet, if you want me to test stuff I be glad to do so.
Just noticed; curl -k -s -u login:pass -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/events?queue=debugnotifications&types=Notification'
Creates a new thread that doesn't seem to end. (Not that I use events in my pollers but may be related in threads not always ending)
Fixed it with a nasty thread sleep in HttpServerConnection::Disconnect when there is m_PendingRequests
Also fixed in my pull-request #5419
Closing this in favour of #6361.
We have many passive services that get updated by a single active check on a host. When we switch from using sockets to API to send passive results the API will crash within minutes.
Expected Behavior
When pushing lots and lots of events via API it should not hang.
Current Behavior
Pushing many passive results to Icinga2 via API results in process hang, possibly the Icinga2 process itself will crash a while later without leaving a crash log. The only way to get API running again is kill -KILL {PID} or killall -KILL icinga2 and restarting and not stress the API / use socket to send passive check results.
Possible Solution
Tried 66c0746. It does seem to make the API a lot faster but it will still crash. edit; it was just faster because it had just restarted
5419 Solves most lockup's in the API and known remote thread leaks.
ps. sorry for the commit mess, getting used to git takes a bit..
Steps to Reproduce (for bugs)
Context
I'm trying to upgrade pollers to use API instead of sockets so they can use direct feedback and give notice if a service is missing but changing only 1 poller from sockets to API kills Icinga within seconds. I use one active service: 'Check Mitel' that gets the alarm state(s) and updates passive services accordingly. It's hard to trigger any other way, running the poller in the CLI many times does not seem to kill Icinga, but via 'Check Now' it fires them all at once. (faster then using shell to spawn curl many times at once) :-)
Is related to/same problem as #5148 I think
Your Environment
icinga2 --version
):Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
Application information: Installation root: /usr/local Sysconf directory: /usr/local/etc Run directory: /var/run Local state directory: /var Package data directory: /usr/local/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /var/run/icinga2/icinga2.pid
System information: Platform: Unknown Platform version: Unknown Kernel: FreeBSD Kernel version: 11.0-RELEASE-p9 Architecture: amd64
Build information: Compiler: Clang 3.8.0
FreeBSD 11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
api checker command graphite ido-mysql livestatus mainlog notification syslog
information/cli: Icinga application loader (version: r2.6.3-1) information/cli: Loading configuration file(s). information/ConfigItem: Committing config item(s). information/ApiListener: My API identity: *.*** warning/ApplyRule: Apply rule 'satellite-host' (in /usr/local/etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere! warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /usr/local/etc/icinga2/conf.d/notifications.conf: 11:1-11:45) for type 'Notification' does not match anywhere! warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /usr/local/etc/icinga2/conf.d/notifications.conf: 20:1-20:48) for type 'Notification' does not match anywhere! warning/ApplyRule: Apply rule 'backup-downtime' (in /usr/local/etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere! information/ConfigItem: Instantiated 1 FileLogger. information/ConfigItem: Instantiated 9 Endpoints. information/ConfigItem: Instantiated 10 Zones. information/ConfigItem: Instantiated 1 SyslogLogger. information/ConfigItem: Instantiated 1 ApiListener. information/ConfigItem: Instantiated 2 ApiUsers. information/ConfigItem: Instantiated 10747 Services. information/ConfigItem: Instantiated 239 Comments. information/ConfigItem: Instantiated 740 Dependencies. information/ConfigItem: Instantiated 1236 Notifications. information/ConfigItem: Instantiated 239 CheckCommands. information/ConfigItem: Instantiated 4 ServiceGroups. information/ConfigItem: Instantiated 5 TimePeriods. information/ConfigItem: Instantiated 3 Users. information/ConfigItem: Instantiated 2 UserGroups. information/ConfigItem: Instantiated 1236 Hosts. information/ConfigItem: Instantiated 21 HostGroups. information/ConfigItem: Instantiated 1 IcingaApplication. information/ConfigItem: Instantiated 3 NotificationCommands. information/ConfigItem: Instantiated 1 CheckerComponent. information/ConfigItem: Instantiated 1 ExternalCommandListener. information/ConfigItem: Instantiated 1 GraphiteWriter. information/ConfigItem: Instantiated 1 IdoMysqlConnection. information/ConfigItem: Instantiated 1 NotificationComponent. information/ConfigItem: Instantiated 1 LivestatusListener. information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' information/cli: Finished validating the configuration file(s).