collective / pas.plugins.ldap

Zope (and Plone) PAS Plugin providing users and groups from LDAP directory
http://pypi.python.org/pypi/pas.plugins.ldap
Other
13 stars 20 forks source link

Resource Exhaustion (too many open files) #106

Open nutjob4life opened 3 years ago

nutjob4life commented 3 years ago

Server logs occasionally show errno 24, "Too many open files", under moderate load with pas.plugins.ldap on:

End users say they "got kicked out of Plone" and "can't log back in for some time". The symptoms are that lsof -p PID (where PID is the Zope instance process ID) shows a steadily increasing number of TCP connections¹ to the LDAP server². The Zope instance log shows:

SERVER_DOWN: {u'info': 'Too many open files', 'errno': 24, 'desc': u"Can't contact LDAP server"}

There is a memcached running with Plone; telnet to its port and asking for stats shows it is indeed populated with info—although it seems like it's not using that info given the rising number of LDAP client connections.

The problem occurs less frequently on:

The number of LDAP connections in this configuration continue to rise up to a point but they will suddenly plummet and seem to be reclaimed. Users don't report being "kicked out of Plone" as much.

The problem appears on unmodified Plone sites as well with no custom add-ons, testing by running 3 or 4 concurrent curl --cookie __ac="…" http://localhost…/folder_contentsin loops.

This report is summarized from this thread on the Plone community. See the thread for additional details.

¹The problem appears with Unix local socket connections too. ²Appears with OpenLDAP slapd 2.4.50 and Apache Directory Service 2.0.0.AM24; Micro$oft AD not available for testing.

fredvd commented 3 years ago

Which OS occurs this issue on and what is the current soft/hard limit for the max number of open files per process? Lots of FD's is not necessarily an issue, unless raising the hard limit to 8192 or higher still depletes them over time.

This is with Linux/unix based OS'es visible/configurable with the ulimit command.

Open file limits can be tricky: the default setting are sometimes too low (1024-2048) for user processes and they can also count towards subprocesses.

So if you start a process manager, which starts Zope, memcaches, haproxy and Varnish as subprocesses sometimes all open files/sockets/FS's from those processes together are the single 2048 limit. (Older) Haproxy for example needs 1000s of them.

Another caveat is setting a higher limit permanently: this is sometimes different limit for a shell vs a process manager started from systemd or the user crontab's @reboot stanza.

So you restart the process manager from an ssh session, every thing is smooth, until the weekly server restart and Plone hangs itself on 1024 max open files.

I even noticed differences in the past when using sudo -u plone bash which messed up open file limits.