canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 2 forks source link

Monitor LDAP server(s) availability when using keystone-ldap #130

Open sudeephb opened 6 months ago

sudeephb commented 6 months ago

When using LDAP via keystone-ldap, if the servers configured in "ldap_server" become unresponsive or cannot be consistently reached the keystone/apache workers remain blocked until they timeout (logs below).

This means that the API/CLI becomes unresponsive. This situation can be hard to troubleshoot depending on the cloud and whether the LDAP server(s) are fully unavailable or just unresponsive.

It would help to add monitoring to have a separate alert that detects this condition.

2021-07-13 21:52:01.508342 raise exc_value 2021-07-13 21:52:01.508346 File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 313, in _ldap_call 2021-07-13 21:52:01.508348 result = func(*args,**kwargs) 2021-07-13 21:52:01.508372 ldap.TIMEOUT 2021-07-13 21:59:45.513100 Timeout when reading response headers from daemon process 'keystone-public': /usr/bin/keystone-wsgi-public 2021-07-13 22:02:06.019309 Timeout when reading response headers from daemon process 'keystone-public': /usr/bin/keystone-wsgi-public 2021-07-13 22:02:12.364508 Timeout when reading response headers from daemon process 'keystone-admin': /usr/bin/keystone-wsgi-admin (...) 2021-07-13 23:19:13.846645 mod_wsgi (pid=1525295): Unable to connect to WSGI daemon process 'keystone-admin' on '/var/run/apache2/wsgi.1299452.6.1.sock' after multiple attempts as listener backlog limit was exceeded. 2021-07-13 23:19:15.374640 mod_wsgi (pid=1349158): Unable to connect to WSGI daemon process 'keystone-admin' on '/var/run/apache2/wsgi.1299452.6.1.sock' after multiple attempts as listener backlog limit was exceeded.


Imported from Launchpad using lp2gh.

fabioabreureis commented 5 months ago

Hello!

I had the same issue last week , and in my case wasn't about the middleware or ldap. I discovered a network issue that implies in the same sympton in my infrastructure.

I strongly recommed check 2 things:

Have a nice day!