Open i-ivantsov opened 1 year ago
In my practice of using 389ds, I also observed a massive memory leak on versions 2.2, 2.3, 2.4 with creation, modification and deletion operations. Approximately 1.000,000 operations in 6 hours process 389ds allocate 32 Gb (all memory compunter) and crash proccess (out off memory).
I have the same problem with 389ds. During modification operations the memory grows up intensively and never be freed in future.
In Fedora/RHEL we ship jemalloc and LD_PRELOAD it in systemd unit for dirsrv. In a container this is not happening:
@ATokma and @OGSmv, could you please confirm that you see this in a container too? Or what distro/package do you use?
@i-ivantsov, could you please make a new image based on docker.io/389ds/dirsrv that also includes jemalloc package and pass LD_PRELOAD=/usr/lib64/libjemalloc.so.2
variable to the container? Or try an image from quay.io/389ds/dirsrv and pass LD_PRELOAD=/usr/lib64/dirsrv/lib/libjemalloc.so.2
variable?
Thanks.
In Fedora/RHEL we ship jemalloc and LD_PRELOAD it in systemd unit for dirsrv. In a container this is not happening:
- Image from Docker Hub ships a different rpm that doesn't bundle jemalloc, because they have a separate jemalloc rpm, but it's not installed in that image.
- In order to use it you also have to explicitly pass LD_PRELOAD environment variable (which I think we should do by default as we do it in Fedora/RHEL).
@ATokma and @OGSmv, could you please confirm that you see this in a container too? Or what distro/package do you use?
@i-ivantsov, could you please make a new image based on docker.io/389ds/dirsrv that also includes jemalloc package and pass
LD_PRELOAD=/usr/lib64/libjemalloc.so.2
variable to the container? Or try an image from quay.io/389ds/dirsrv and passLD_PRELOAD=/usr/lib64/dirsrv/lib/libjemalloc.so.2
variable?Thanks.
Yes, i see this using container from docker hub. Thank you for help!
@vashirov I am using Docker containers from here https://hub.docker.com/r/389ds/dirsrv/tags I'll try your advice. Thank you very much!
Victor, thank you for reply.
I build container with libjemalloc using dockerfile:
FROM 389ds/dirsrv:2.4
ENV DS_DM_PASSWORD="****"
RUN zypper --non-interactive install uuid openldap2-client
RUN zypper --non-interactive install jemalloc
RUN zypper clean
RUN mkdir -p /xxx.init
COPY ./ldap_init.sh /xxx.init
COPY ./struct.ldif /xxx.init
COPY ./schema.ldif /xxx.init
COPY ./start_ds.sh /xxx.init
CMD [ "/xxx.init/start_ds.sh" ]
And start it with docker_compose.yaml:
version: "3"
volumes:
389ds:
name: 389ds
services:
xxx_ldap:
container_name: xxx_ldap
image: docker_xxx_ldap
restart: always
tty: True
stdin_open: True
volumes:
- 389ds:/data
ports:
- 3389:3389
- 3636:3636
environment:
- TZ=Europe/Moscow
- RDOMAIN_NAME=xxx
- RDOMAIN_EXT=net
- LD_PRELOAD=/usr/lib64/libjemalloc.so.2
#- DS_REINDEX=true
networks:
- xxx_sd_net
networks:
xxx_sd_net:
name: xxx_sd_net
Unfortunately a memory leak has been left:
Hi all,
I have no changes also after installing jemalloc into docker.io container.
Container has been started with command
docker run -v 389ds_2.4m:/data -p 3389:3389 -p 636:3636 -d --restart unless-stopped --name my389ds -e LD_PRELOAD="/usr/lib64/libjemalloc.so.2" my389ds_img:1.0
lib exists in path
Memory grows up on update operations in several simultaneous connections.
@i-ivantsov, @OGSmv, thank you for the results. Could you please provide the following details about your environment:
dsconf localhost monitor dbmon
from inside the container at various points (before/after adding, updating, deleting entries).Thanks.
If you have any scripts to reproduce the problem, please share them as well.
@vashirov, i need some time for get diagnostic information. I'll try to provide information next week.
Thanks.
Thanks! In the meantime I tried to add and delete 1.5M entries. RSS grew to 21.2Gb after adding entries and didn't reduce after deleting them. This is with glibc memory allocator, not with jemalloc. I will rerun this under jemalloc and ASAN/LSAN to see if it reports anything interesting.
ASAN reports 32 byte leak per entry during delete operation in
Direct leak of 319712 byte(s) in 9991 object(s) allocated from:
#0 0x7fceb6d172ff in malloc (/lib64/libasan.so.8+0xd92ff) (BuildId: 9501248886f79bf1482f3e153f794be742818172)
#1 0x7fceb5a09e29 in GrowStuff (/lib64/libnspr4.so+0x12e29) (BuildId: 64a3eb7023749c27ae28319f7a3f58f51a39a225)
#2 0x7fceb5a09b74 in fill_n (/lib64/libnspr4.so+0x12b74) (BuildId: 64a3eb7023749c27ae28319f7a3f58f51a39a225)
#3 0x7fceb5a0a918 in dosprintf (/lib64/libnspr4.so+0x13918) (BuildId: 64a3eb7023749c27ae28319f7a3f58f51a39a225)
#4 0x7fceb5a0b084 in PR_vsmprintf (/lib64/libnspr4.so+0x14084) (BuildId: 64a3eb7023749c27ae28319f7a3f58f51a39a225)
#5 0x7fceb650904c in slapi_ch_smprintf (/usr/lib64/dirsrv/libslapd.so.0+0x28b04c) (BuildId: 9f2aab38b0cf887c225fd5b7bfdfdb21c96f5c70)
#6 0x7fceaf5aa3f3 in entryrdn_delete_key ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:2700
#7 0x7fceaf5ab39e in entryrdn_index_entry ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:252
#8 0x7fceaf575e22 in index_addordel_entry ldap/servers/slapd/back-ldbm/index.c:487
#9 0x7fceaf5ac469 in ldbm_back_delete ldap/servers/slapd/back-ldbm/ldbm_delete.c:811
#10 0x7fceb653af92 in op_shared_delete ldap/servers/slapd/delete.c:338
#11 0x7fceb653b702 in do_delete (/usr/lib64/dirsrv/libslapd.so.0+0x2bd702) (BuildId: 9f2aab38b0cf887c225fd5b7bfdfdb21c96f5c70)
#12 0x564bf360b620 in connection_dispatch_operation ldap/servers/slapd/connection.c:643
#13 0x564bf360b620 in connection_threadmain ldap/servers/slapd/connection.c:1805
#14 0x7fceb5a24758 in _pt_root (/lib64/libnspr4.so+0x2d758) (BuildId: 64a3eb7023749c27ae28319f7a3f58f51a39a225)
#15 0x7fceb610e906 in start_thread (/lib64/libc.so.6+0x8c906) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
To clarify: this is on 389-ds-base-2.3.2-202305111611git7c36748c0.fc38.x86_64, not the latest 2.4, I had an old checkout by accident. Retrying on the latest main branch now...
Right, in 2.4 I don't see this since it was fixed in 6033380aaa8701b0ab7e270f407913f4a3600153. But version 389-ds-2.4.0~git0.87efeb2-163.2.x86_64 includes this fix and should not leak memory on delete.
@vashirov, sorry for delay with response.
See below information you asked earlier:
!(objectClass=nsTombstone)
;!(objectClass=nsTombstone)
memory leak is observed too, but not so intensively.before inserts:
DB Monitor Report: 2023-05-29 11:09:34
--------------------------------------------------------
Database Cache:
- Cache Hit Ratio: 99%
- Free Space: 1002.66 MB
- Free Percentage: 100.0%
- RO Page Drops: 0
- Pages In: 26
- Pages Out: 53508
Normalized DN Cache:
- Cache Hit Ratio: 94%
- Free Space: 0 B
- Free Percentage: 0.0%
- DN Count: 124830
- Evictions: 19322
Backends:
- dc=test,dc=net (test_net):
- Entry Cache Hit Ratio: 45%
- Entry Cache Count: 109
- Entry Cache Free Space: 3.88 KB
- Entry Cache Free Percentage: 0.8%
- Entry Cache Average Size: 4.55 KB
- DN Cache Hit Ratio: 95%
- DN Cache Count: 24001
- DN Cache Free Space: 13.33 MB
- DN Cache Free Percentage: 83.3%
- DN Cache Average Size: 116.0 B
before read and update:
DB Monitor Report: 2023-05-29 11:11:20
--------------------------------------------------------
Database Cache:
- Cache Hit Ratio: 99%
- Free Space: 1002.66 MB
- Free Percentage: 100.0%
- RO Page Drops: 0
- Pages In: 26
- Pages Out: 56326
Normalized DN Cache:
- Cache Hit Ratio: 94%
- Free Space: 0 B
- Free Percentage: 0.0%
- DN Count: 124830
- Evictions: 47034
Backends:
- dc=test,dc=net (test_net):
- Entry Cache Hit Ratio: 44%
- Entry Cache Count: 160
- Entry Cache Free Space: 548.0 B
- Entry Cache Free Percentage: 0.1%
- Entry Cache Average Size: 3.12 KB
- DN Cache Hit Ratio: 95%
- DN Cache Count: 48001
- DN Cache Free Space: 11.74 MB
- DN Cache Free Percentage: 73.4%
- DN Cache Average Size: 93.0 B
before delete:
DB Monitor Report: 2023-05-29 11:13:14
--------------------------------------------------------
Database Cache:
- Cache Hit Ratio: 99%
- Free Space: 1002.66 MB
- Free Percentage: 100.0%
- RO Page Drops: 0
- Pages In: 26
- Pages Out: 61130
Normalized DN Cache:
- Cache Hit Ratio: 94%
- Free Space: 0 B
- Free Percentage: 0.0%
- DN Count: 124830
- Evictions: 47034
Backends:
- dc=test,dc=net (test_net):
- Entry Cache Hit Ratio: 43%
- Entry Cache Count: 135
- Entry Cache Free Space: 20.0 B
- Entry Cache Free Percentage: 0.0%
- Entry Cache Average Size: 3.7 KB
- DN Cache Hit Ratio: 96%
- DN Cache Count: 48001
- DN Cache Free Space: 11.74 MB
- DN Cache Free Percentage: 73.4%
- DN Cache Average Size: 93.0 B
after delete:
DB Monitor Report: 2023-05-29 11:23:01
--------------------------------------------------------
Database Cache:
- Cache Hit Ratio: 99%
- Free Space: 1002.66 MB
- Free Percentage: 100.0%
- RO Page Drops: 0
- Pages In: 26
- Pages Out: 100912
Normalized DN Cache:
- Cache Hit Ratio: 95%
- Free Space: 0 B
- Free Percentage: 0.0%
- DN Count: 124830
- Evictions: 129203
Backends:
- dc=test,dc=net (test_net):
- Entry Cache Hit Ratio: 45%
- Entry Cache Count: 109
- Entry Cache Free Space: 3.74 KB
- Entry Cache Free Percentage: 0.7%
- Entry Cache Average Size: 4.55 KB
- DN Cache Hit Ratio: 95%
- DN Cache Count: 48001
- DN Cache Free Space: 10.66 MB
- DN Cache Free Percentage: 66.6%
- DN Cache Average Size: 116.0 B
Thank you for the reproducer, it was really helpful! On 389ds from 87efeb2 I see 2 major leaks during the search operation:
Direct leak of 8521148896 byte(s) in 49999 object(s) allocated from:
#0 0x7fb300a1e1f5 (/lib64/libasan.so.8+0xd81f5) (BuildId: bac59ca9f1e357781008d7f6982314d30ca62672)
#1 0x7fb300213daf in slapi_ch_realloc (/usr/lib64/dirsrv/libslapd.so.0+0x299daf) (BuildId: b4c1b5c8f5385a01fc0aecba12adb29054dd0718)
#2 0x7fb2f925daab in idl_append_extend ldap/servers/slapd/back-ldbm/idl_common.c:141
#3 0x7fb2f92642a6 in idl_new_fetch ldap/servers/slapd/back-ldbm/idl_new.c:232
#4 0x7fb2f926a375 in index_read_ext_allids ldap/servers/slapd/back-ldbm/index.c:1071
#5 0x7fb2f9252383 in keys2idl ldap/servers/slapd/back-ldbm/filterindex.c:1076
#6 0x7fb2f93e19a4 in ava_candidates.isra.0 ldap/servers/slapd/back-ldbm/filterindex.c:315
#7 0x7fb2f9255d4f in list_candidates ldap/servers/slapd/back-ldbm/filterindex.c:848
#8 0x7fb2f925317b in filter_candidates_ext ldap/servers/slapd/back-ldbm/filterindex.c:132
#9 0x7fb2f92bf503 in subtree_candidates ldap/servers/slapd/back-ldbm/ldbm_search.c:1263
#10 0x7fb2f92d20f9 in build_candidate_list ldap/servers/slapd/back-ldbm/ldbm_search.c:1057
#11 0x7fb2f92d20f9 in ldbm_back_search ldap/servers/slapd/back-ldbm/ldbm_search.c:633
#12 0x7fb3002d6847 in op_shared_search (/usr/lib64/dirsrv/libslapd.so.0+0x35c847) (BuildId: b4c1b5c8f5385a01fc0aecba12adb29054dd0718)
#13 0x55ddec23d79f in do_search ldap/servers/slapd/search.c:410
#14 0x55ddec2139b8 in connection_dispatch_operation ldap/servers/slapd/connection.c:677
#15 0x55ddec2139b8 in connection_threadmain ldap/servers/slapd/connection.c:1805
#16 0x7fb2ff720758 in _pt_root ../../../../nspr/pr/src/pthreads/ptthread.c:201
#17 0x7fb2ffe0a906 in start_thread /usr/src/debug/glibc-2.37-4.fc38.x86_64/nptl/pthread_create.c:444
Direct leak of 360000 byte(s) in 10000 object(s) allocated from:
#0 0x7fb300a1ecd7 in calloc (/lib64/libasan.so.8+0xd8cd7) (BuildId: bac59ca9f1e357781008d7f6982314d30ca62672)
#1 0x7fb300214265 in slapi_ch_calloc (/usr/lib64/dirsrv/libslapd.so.0+0x29a265) (BuildId: b4c1b5c8f5385a01fc0aecba12adb29054dd0718)
#2 0x7fb2f925d0fc in idl_alloc ldap/servers/slapd/back-ldbm/idl_common.c:60
#3 0x7fb2f926a680 in index_read_ext_allids ldap/servers/slapd/back-ldbm/index.c:1085
#4 0x7fb2f9252383 in keys2idl ldap/servers/slapd/back-ldbm/filterindex.c:1076
#5 0x7fb2f93e19a4 in ava_candidates.isra.0 ldap/servers/slapd/back-ldbm/filterindex.c:315
#6 0x7fb2f9255d4f in list_candidates ldap/servers/slapd/back-ldbm/filterindex.c:848
#7 0x7fb2f925317b in filter_candidates_ext ldap/servers/slapd/back-ldbm/filterindex.c:132
#8 0x7fb2f92bf503 in subtree_candidates ldap/servers/slapd/back-ldbm/ldbm_search.c:1263
#9 0x7fb2f92d20f9 in build_candidate_list ldap/servers/slapd/back-ldbm/ldbm_search.c:1057
#10 0x7fb2f92d20f9 in ldbm_back_search ldap/servers/slapd/back-ldbm/ldbm_search.c:633
#11 0x7fb3002d6847 in op_shared_search (/usr/lib64/dirsrv/libslapd.so.0+0x35c847) (BuildId: b4c1b5c8f5385a01fc0aecba12adb29054dd0718)
#12 0x55ddec23d79f in do_search ldap/servers/slapd/search.c:410
#13 0x55ddec2139b8 in connection_dispatch_operation ldap/servers/slapd/connection.c:677
#14 0x55ddec2139b8 in connection_threadmain ldap/servers/slapd/connection.c:1805
#15 0x7fb2ff720758 in _pt_root ../../../../nspr/pr/src/pthreads/ptthread.c:201
#16 0x7fb2ffe0a906 in start_thread /usr/src/debug/glibc-2.37-4.fc38.x86_64/nptl/pthread_create.c:444
I will try to isolate them to standalone test cases.
With the following test case I can trigger the leak: https://github.com/vashirov/389-ds-base/blob/afa2945a0419cb07ce20d0716678792b2e8096b5/dirsrvtests/tests/suites/memory_leaks/allids_search_test.py
Direct leak of 36 byte(s) in 1 object(s) allocated from:
#0 0x7f5ae7ed8cd7 in calloc (/lib64/libasan.so.8+0xd8cd7) (BuildId: bac59ca9f1e357781008d7f6982314d30ca62672)
#1 0x7f5ae769a265 in slapi_ch_calloc (/usr/lib64/dirsrv/libslapd.so.0+0x29a265) (BuildId: b4c1b5c8f5385a01fc0aecba12adb29054dd0718)
#2 0x7f5ae00bd0fc in idl_alloc (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0xbd0fc) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#3 0x7f5ae00ca680 in index_read_ext_allids (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0xca680) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#4 0x7f5ae00b2383 in keys2idl.lto_priv.0 (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0xb2383) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#5 0x7f5ae02419a4 in ava_candidates.isra.0 (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0x2419a4) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#6 0x7f5ae00b5d4f in list_candidates (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0xb5d4f) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#7 0x7f5ae00b317b in filter_candidates_ext (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0xb317b) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#8 0x7f5ae011f503 in subtree_candidates (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0x11f503) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#9 0x7f5ae01320f9 in ldbm_back_search (/usr/lib64/dirsrv/plugins/libback-ldbm.so+0x1320f9) (BuildId: 3b5c656aae1898ebd9f9b1f52dcbc1dc7495015e)
#10 0x7f5ae775c847 in op_shared_search (/usr/lib64/dirsrv/libslapd.so.0+0x35c847) (BuildId: b4c1b5c8f5385a01fc0aecba12adb29054dd0718)
#11 0x56553563b79f in do_search (/usr/sbin/ns-slapd+0x8879f) (BuildId: 35fbd32a64bd9a751433be1ed37a6665d214fc65)
#12 0x5655356119b8 in connection_threadmain (/usr/sbin/ns-slapd+0x5e9b8) (BuildId: 35fbd32a64bd9a751433be1ed37a6665d214fc65)
#13 0x7f5ae7005758 in _pt_root (/lib64/libnspr4.so+0x2d758) (BuildId: 64a3eb7023749c27ae28319f7a3f58f51a39a225)
#14 0x7f5ae72adc56 in start_thread (/lib64/libc.so.6+0x8cc56) (BuildId: 6107835fa7d4725691b2b7f6aaee7abe09f493b2)
#15 0x7f5ae7333a6f in __clone3 (/lib64/libc.so.6+0x112a6f) (BuildId: 6107835fa7d4725691b2b7f6aaee7abe09f493b2)
ab9dfcbdb..cde047740 389-ds-base-2.3 -> 389-ds-base-2.3 380b15bd7..88cafdfe2 389-ds-base-2.2 -> 389-ds-base-2.2 f5112d9c4..1ce14b7eb 389-ds-base-2.1 -> 389-ds-base-2.1 9072a18a6..ac8b2a784 389-ds-base-2.0 -> 389-ds-base-2.0
Looks like the last commit introduced a few failures in the basic test suite:
=========================== short test summary info ============================ FAILED dirsrvtests/tests/suites/basic/basic_test.py::test_basic_searches - as... FAILED dirsrvtests/tests/suites/basic/basic_test.py::test_basic_search_lookthroughlimit[200-PASS] FAILED dirsrvtests/tests/suites/basic/basic_test.py::test_basic_search_lookthroughlimit[50-ADMINLIMIT_EXCEEDED] FAILED dirsrvtests/tests/suites/basic/basic_test.py::test_basic_referrals - a... ERROR dirsrvtests/tests/suites/basic/basic_test.py::test_search_req_attrs[attrs0-cn-False] ERROR dirsrvtests/tests/suites/basic/basic_test.py::test_search_req_attrs[attrs1-cn-True] ERROR dirsrvtests/tests/suites/basic/basic_test.py::test_search_req_attrs[attrs2-nsUniqueId-True] ERROR dirsrvtests/tests/suites/basic/basic_test.py::test_search_req_attrs[attrs3-cn-True] ERROR dirsrvtests/tests/suites/basic/basic_test.py::test_search_req_attrs[attrs4-cn-True] ======= 4 failed, 47 passed, 161 warnings, 5 errors in 604.48s (0:10:04) =======
I've tested on a separate machine with and without the commit, and it passes without the fix but fails with.
f1d407e1e..b2784617a 389-ds-base-2.0 -> 389-ds-base-2.0 b4bd0720a..a5d708d9b 389-ds-base-2.1 -> 389-ds-base-2.1 fdce42d97..504fb72cf 389-ds-base-2.2 -> 389-ds-base-2.2 a9f687e81..8d2517930 389-ds-base-2.3 -> 389-ds-base-2.3
The Massive memory leaking on update operations are seen.
Package Version and Platform:
Platform: Debian GNU/Linux 10(buster) Package and version: 389-ds-2.4.0~git0.87efeb2-163.2.x86_64 in docker container 389-Directory/2.4.0 B2023.116.0000
Steps to reproduce the behavior: 1) parallel update operations in 25 connections (read, create and delete operations are performed too); 2) ds389 server had used all available memory.
There is a diagram with memory usage collected with nmon and nmonchart: