Closed cah-jeremykuhn closed 6 years ago
Did you see something like this in your logfile?
[2018-07-18 15:18:37 +0200] critical/ApiListener: Client TLS handshake failed (from [21X.XX.XX.9]:39242)
Context:
(0) Handling new API client connection
If yes I think it has something to do with #6445 and I already mentioned it there.
Yea I didnt read too far down into that issue (my mistake) but i am getting the exact same log message errors, so its most likely the same issue. I'll close this. Thanks for the quick response!
Since upgrading from 2.8.4 to 2.9.0 two days ago, roughly 250 service checks out of the total 1110 across 223 hosts are throwing alerts that they are unable to connect to the client endpoint. And its not even entire hosts. I have several hosts where Disk and CPU checks will work, but the CPU usage check will say the host is not connected. The weird part is that some checks recover while others go down as host not connect. I've tried increasing and decreasing both the max_concurrent_checks and the ulimit but neither has made the failed checks go away. I have a screenshots as an example. The clients that connect to the master are a mix of Windows Server 2016 and Ubuntu 14.04 hosts, however none of the windows hosts are experiencing any issues.
Current max_concurrent_checks = 2048 Current ulimits: hard nofile 650000 soft nofile 100000
Expected Behavior
All service checks should be able to connect.
Current Behavior
20% of service checks are failing with "Remote instance is not connected to
Here is a screenshot of a host with 3 services that are able to connect and one that is not:
And here is the code for those checks (instance memory is the one not working):
Steps to Reproduce (for bugs)
Your Environment
icinga2 feature list
):icinga2 daemon -C
):zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.