flamencist / ldap4net

OpenLdap port for DotNet Core (Linux\OSX\Windows)
MIT License
213 stars 38 forks source link

Fix memory leaks and random crashes #178

Open BalassaMarton opened 1 year ago

BalassaMarton commented 1 year ago

Creating a draft for discussion.

Background

I started investigating #136 because it is a massive blocker in our Linux environments. My test project (not included as it connects to corporate infra) does some large queries as well as a few single-entry fetches. After some iterations it consistently fails with different errors (Can't contact LDAP server. Result: -1; or -2; or assertion failures from the C wrapped libraries). I also measured memory usage with dotMemory, and found that it increases indefinitely when using a single LdapConnection.

Memory leaks

I've discovered a few possible causes for the memory leaks, mostly around incorrectly released buffers. My process was to look at every allocation made through the native APIs and check the docs to see if everything is not released accordingly. Note that in some cases allocated buffers are freed differently on Windows and Linux, see https://linux.die.net/man/3/ber_scanf and https://learn.microsoft.com/en-us/windows/win32/api/winber/nf-winber-ber_scanf. To cover these differences, I added a new method BerScanfFree to LdapNative. The OSX version is just a copy of the Linux version, hopefully it will work (never tested).

After making these changes, memory usage is much better, with only minimal increase after each iteration. Later I might do some more digging and maybe submit a new PR if I can further improve it.

Threading

None of the fixes around buffers solved the randomly occurring errors, and after some digging I've found that libldap before 2.4 versions is not multi-threaded and there's a threaded version of the library called libldap_r. Changing my symlinks to point to that .so made the random errors disappear completely. Starting with version 2.5, the threaded library is the default, so this problem only affects clients using 2.4 and older versions of OpenLDAP.

vossjannik commented 3 months ago

Hi, what is the current state of this PR? @BalassaMarton Are you using these fixes successfully in production? @flamencist Do you have time to review this and maybe even fix the build error? #103 seems related.

I recently observed this read access violation in a test environment: (Windows 10 client connecting via SSL to an OpenLDAP server running on Linux) image

Unhandled exception thrown: read access violation.
this->m_pUMThunkMarshInfo was nullptr.
BalassaMarton commented 3 months ago

Hi, what is the current state of this PR? @BalassaMarton Are you using these fixes successfully in production?

I don't work on that project anymore, but iirc using the multi-threaded LDAP binaries on Linux had solved our problems back then. I don't think your current issue on Windows is related.