389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
210 stars 89 forks source link

SIGSEV with sync_repl #4711

Closed tbordaz closed 2 years ago

tbordaz commented 3 years ago

Issue Description When running a sync_repl client, it crashes current master

=================================================================
==2268384==ERROR: AddressSanitizer: SEGV on unknown address 0x000041b58abb (pc 0x7f2c75c6c68d bp 0x7f2be7fc3d60 sp 0x7f2be7fc3d40 T12)
==2268384==The signal is caused by a READ memory access.
    #0 0x7f2c75c6c68c in slapi_value_get_string ldap/servers/slapd/value.c:371
    #1 0x7f2c71534436 in sync_create_state_control ldap/servers/plugins/sync/sync_util.c:194
    #2 0x7f2c7153a89f in sync_srch_refresh_pre_entry ldap/servers/plugins/sync/sync_refresh.c:275
    #3 0x7f2c75bb7365 in plugin_call_func ldap/servers/slapd/plugin.c:2002
    #4 0x7f2c75bb6fb6 in plugin_call_list ldap/servers/slapd/plugin.c:1945
    #5 0x7f2c75baf71c in plugin_call_plugins ldap/servers/slapd/plugin.c:414
    #6 0x7f2c75bf9a6a in send_ldap_search_entry_ext ldap/servers/slapd/result.c:1488
    #7 0x7f2c75bf75e6 in send_ldap_search_entry ldap/servers/slapd/result.c:1050
    #8 0x7f2c75b82664 in send_entry ldap/servers/slapd/opshared.c:1140
    #9 0x7f2c75b83687 in iterate ldap/servers/slapd/opshared.c:1326
    #10 0x7f2c75b8493b in send_results_ext ldap/servers/slapd/opshared.c:1543
    #11 0x7f2c75b7fffe in op_shared_search ldap/servers/slapd/opshared.c:882
    #12 0x4700b1 in do_search ldap/servers/slapd/search.c:388
    #13 0x4243c7 in connection_dispatch_operation ldap/servers/slapd/connection.c:659
    #14 0x42a5f5 in connection_threadmain ldap/servers/slapd/connection.c:1777
    #15 0x7f2c75549b33  (/lib64/libnspr4.so+0x2bb33)
    #16 0x7f2c754de4e1 in start_thread (/lib64/libpthread.so.0+0x94e1)
    #17 0x7f2c753966a2 in clone (/lib64/libc.so.6+0x1016a2)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ldap/servers/slapd/value.c:371 in slapi_value_get_string
Thread T12 created by T0 here:
    #0 0x7f2c75de9955 in pthread_create (/lib64/libasan.so.5+0x3a955)
    #1 0x7f2c7554981a  (/lib64/libnspr4.so+0x2b81a)

Package Version and Platform:

Expected results Should not crash

tbordaz commented 3 years ago

0a399a2b4..d7eef2fcf master 91c1c4d08..58dbf084a 389-ds-base-1.4.4 942cc16d2..2e5b52601 389-ds-base-1.4.3

mreynolds389 commented 3 years ago

39891cd7a..0bdc258b9 389-ds-base-1.4.2 -> 389-ds-base-1.4.2

johnkeates commented 3 years ago

I ran into this issue today, any way to bypass this, short of replacing the entire replica?

mreynolds389 commented 3 years ago

I ran into this issue today, any way to bypass this, short of replacing the entire replica?

@johnkeates Really you just need to upgrade to the latest version that has this fix. Otherwise disable Sync Repl plugin, but then you lose that functionality (probably not an option).

What version and platform are you on?

johnkeates commented 3 years ago

I ran into this issue today, any way to bypass this, short of replacing the entire replica?

@johnkeates Really you just need to upgrade to the latest version that has this fix. Otherwise disable Sync Repl plugin, but then you lose that functionality (probably not an option).

What version and platform are you on?

I'm beholden to whatever version is in the CentOS 8 Stream DL1 module repo so I suppose it's a matter of wanting for what Red Hat wants to do. RPM version is 389-ds-base-1.4.3.16-8.module_el8.4.0+644+ed25d39e.x86_64 and the startup line in the logs is 389-Directory/1.4.3.16 B2021.020.1847. (accidentally created a duplicate BZ @ https://bugzilla.redhat.com/show_bug.cgi?id=1973337 )

There are plenty of other nodes that can serve requests so it's not a big deal right now.

Firstyear commented 3 years ago

@johnkeates Quite a few members of this team work for RH, and they care a lot to make sure these fixes get to these platforms :) while the process may take a little while, it will happen.

johnkeates commented 3 years ago

@johnkeates Quite a few members of this team work for RH, and they care a lot to make sure these fixes get to these platforms :) while the process may take a little while, it will happen.

I figured as much :) Since the system is still serving requests just fine with the remaining nodes I will just leave it be for now and drop the DNS records to speed up skipping the non-functional node.

johnkeates commented 3 years ago

Looks like it's in the Stream 8 repo and works well now!

vashirov commented 2 years ago

This is still broken and crashes in

Thread 14 "ns-slapd" received signal SIGSEGV, Segmentation fault.
0x00007f7802ba38d6 in __strcmp_evex () from target:/lib64/libc.so.6
(gdb) bt
#0  0x00007f7802ba38d6 in __strcmp_evex () at target:/lib64/libc.so.6
#1  0x00007f77fe926e9f in sync_cookie_isvalid (refcookie=0x7f77febfaba0, testcookie=0x7f77febfab80)
    at ldap/servers/plugins/sync/sync_util.c:796
#2  sync_cookie_isvalid (testcookie=0x7f77febfab80, refcookie=0x7f77febfaba0) at ldap/servers/plugins/sync/sync_util.c:789
#3  0x00007f77fe92aa7d in sync_srch_refresh_pre_search (pb=0x7f77feb9fd00) at ldap/servers/plugins/sync/sync_refresh.c:135
#4  0x00007f7802e297d9 in plugin_call_func
    (list=0x7f77fe9ed800, operation=operation@entry=403, pb=pb@entry=0x7f77feb9fd00, call_one=call_one@entry=0)
    at ldap/servers/slapd/plugin.c:2001
#5  0x00007f7802e299e6 in plugin_call_list (pb=0x7f77feb9fd00, operation=403, list=<optimized out>) at ldap/servers/slapd/plugin.c:1944
#6  plugin_call_plugins (pb=0x7f77feb9fd00, whichfunction=403) at ldap/servers/slapd/plugin.c:414
#7  0x00007f7802e222a9 in op_shared_search (pb=pb@entry=0x7f77feb9fd00, send_result=send_result@entry=1) at ldap/servers/slapd/opshared.c:586
#8  0x0000556eb3f0db14 in do_search (pb=<optimized out>) at ldap/servers/slapd/search.c:388
#9  0x0000556eb3efcb7f in connection_dispatch_operation (pb=0x7f77feb9fd00, op=<optimized out>, conn=<optimized out>)
    at ldap/servers/slapd/connection.c:659
#10 connection_threadmain () at ldap/servers/slapd/connection.c:1785
#11 0x00007f780290ec34 in _pt_root () at target:/lib64/libnspr4.so
#12 0x00007f7802b75802 in start_thread () at target:/lib64/libc.so.6
#13 0x00007f7802b15450 in clone3 () at target:/lib64/libc.so.6

Steps to reproduce are the same except cookie should be malformed: -E sync=rp/foo We have a reproducer in our test suite: https://github.com/389ds/389-ds-base/blob/main/dirsrvtests/tests/tickets/ticket48013_test.py

tbordaz commented 2 years ago

To track the crash reported in https://github.com/389ds/389-ds-base/issues/4711#issuecomment-1205100979, I opened https://github.com/389ds/389-ds-base/issues/5418.

They are different issues. RC of #4711 is handling of entries that have no nsuniqueid. RC of #5418 is handling of specific cookie

Closing the ticket