389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 91 forks source link

Issues with high numbers of entry state information values on objects during mod_delete #1953

Closed 389-ds-bot closed 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/48894


In a group with a high level of churn, IE 30,000 members added then deleted, 5000 members remain. This group performs poorly for any mod delete of a single member after this.

It was reported that purge_entry_state_information() leaves a_deleted_values unsorted which means that during the delete operation it is re-sorted while the database lock is held.

This issue is avoided in the ADD case by a different algorithm.

The impact is on an object such as a group which has man entry state information attributes. Hangs of up to 30+ seconds have been observed.

We should create a test case to simulate this, followed up by improving the performance of attribute delete.

The set of values is sorted with valueset_array_to_sorted(), which is only called by slapi_valueset_add_attr_valuearray_ext(). slapi_valueset_add_attr_valuearray_ext() is used in a number of locations, mainly in this case entrywsi.c.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-11-15 11:52:49

attachment 0001-Ticket-48894-Issue-with-high-number-of-entry-state-o.patch

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2016-11-15 14:22:36

I think this patch tries to solve the problem in the wrong place. As stated in the ticket description it does not occur for add operations as a different algorithm is used. In the implementation of the sorted valueset the assumption was that as soon as the numebr of values passes the threshold it would be sorted - and for arrays of ~10 elements quicksort seemed to be some overhead.
Could we first investigate why the behaviour in ŧhe del is different and if it could be made"sort friendly" as the add

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2016-11-15 23:47:36

Sorry, it was me to ask William to replace the current insertion sort with the quick sort instead of adding on top of it as in the original proposal...

Ludwig, do you think we should go back to the original proposal, which is ... if the value count is 10 through 100, use the insertion sort and if it is larger than 100, use quicksort?

According to the benchmark, it seems the quicksort is beneficial for the large group entries.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-11-16 05:01:05

The patch was provided by someone else, I just cleaned it up and improved it. It does fix their issue apparently, and I'm not 100% familiar with this part of the code to comment largely. I believe some of their issue is when they are adding large number of attributes to new objects.

I still agree that we should look at other parts of the code too, and I think perhaps this is part of our code that could benefit from some examination and maybe a better way to store values?

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2016-11-17 21:14:08

it is ok for me to replace the sorting algorithm if it gives immediate relief, but I still would like to understand the real cause of the problem. so we should not just close this ticket or create an other one for the core issue

the indentation in valueset_replace_valuearray_ext() looks now a bit zigzagged, at least in the patch view

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-11-18 05:54:54

Right, I'll leave this issue open then.

I have fixed the whitespace issue, but it was there before I started. Given it's just white space, the other content of the patch is the same, I'll assume your ack extends to that.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-11-18 06:06:59

commit 88b9b9eb0469beb788ad90c2e289855e54381a5c Writing objects: 100% (8/8), 2.44 KiB | 0 bytes/s, done. Total 8 (delta 6), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 3c537b0..88b9b9e master -> master

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-11-25 06:34:09

attachment 0001-Ticket-48894-Issues-with-delete-of-entrywsi-with-lar.patch

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-11-25 06:35:01

This patch should finish correcting the issue. The ultimate issue was the removal of vs->sorted during valueset_purge. This fix prevents that occuring. I have tested it with the basic, dynamic and replication suites, and it is passing.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2016-11-29 10:12:30

Indentation is off:

valueset_array_purge()
...
    753         if (vs->va[j]->v_csnset == NULL) { 
    754            slapi_value_free(&vs->va[j]); 
    755            vs->va[j] = NULL; 
    756         } 

Otherwise, ack.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-11-29 10:46:24

Fixed the white space issue!

commit d92394c23f5c12116530834f2519ddfc410e6b8a Writing objects: 100% (77/77), 10.81 KiB | 0 bytes/s, done. Total 77 (delta 65), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 86bffc8..b1f434e master -> master

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2016-11-30 18:18:55

These patches are crashing the directory server in the dynamic plugins test suite(the standalone instance is the one that is crashing)

0  0x00007f65f0645ddf in valueset_value_cmp (a=0x7f6500098900, v1=0x0, v2=0x7f6500000078)
    at ../ds/ldap/servers/slapd/valueset.c:931
1  0x00007f65f0646140 in valueset_array_to_sorted_quick (a=0x7f6500098900, vs=0x7f6500098938, low=0, high=11)
    at ../ds/ldap/servers/slapd/valueset.c:1017
2  0x00007f65f0645fe7 in valueset_array_to_sorted (a=0x7f6500098900, vs=0x7f6500098938) at ../ds/ldap/servers/slapd/valueset.c:990
3  0x00007f65f0646656 in slapi_valueset_add_attr_valuearray_ext (a=0x7f6500098900, vs=0x7f6500098938, addvals=0x7f65000d42b0, 
    naddvals=1, flags=1, dup_index=0x0) at ../ds/ldap/servers/slapd/valueset.c:1137
4  0x00007f65f05b259a in entry_delete_present_values_wsi_multi_valued (e=0x7f6500098240, type=0x7f65000938b0 "uniqueMember", 
    vals=0x7f6500029e80, csn=0x7f6539ff9a60, urp=0, mod_op=129, replacevals=0x0) at ../ds/ldap/servers/slapd/entrywsi.c:877
5  0x00007f65f05b1e0b in entry_delete_present_values_wsi (e=0x7f6500098240, type=0x7f65000938b0 "uniqueMember", 
    vals=0x7f6500029e80, csn=0x7f6539ff9a60, urp=0, mod_op=129, replacevals=0x0) at ../ds/ldap/servers/slapd/entrywsi.c:643
6  0x00007f65f05b2790 in entry_apply_mod_wsi (e=0x7f6500098240, mod=0x7f650005efd0, csn=0x7f6539ff9a60, urp=0)
    at ../ds/ldap/servers/slapd/entrywsi.c:948
7  0x00007f65f05b2a48 in entry_apply_mods_wsi (e=0x7f6500098240, smods=0x7f6539ff9ae0, csn=0x7f650003e770, urp=0)
    at ../ds/ldap/servers/slapd/entrywsi.c:1024
8  0x00007f65e33c1a06 in modify_apply_check_expand (pb=0x7f650005abc0, operation=0x7f650003a0c0, mods=0x7f6500048500, 
    e=0x7f64f4081750, ec=0x7f65000981d0, postentry=0x7f6539ff9bd8, ldap_result_code=0x7f6539ff9b5c, 
    ldap_result_message=0x7f6539ff9bf8) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_modify.c:247
9  0x00007f65e33c28d5 in ldbm_back_modify (pb=0x7f650005abc0) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_modify.c:639
10 0x00007f65f05e4152 in op_shared_modify (pb=0x7f650005abc0, pw_change=0, old_pw=0x0) at ../ds/ldap/servers/slapd/modify.c:1055
11 0x00007f65f05e2d53 in modify_internal_pb (pb=0x7f650005abc0) at ../ds/ldap/servers/slapd/modify.c:599
12 0x00007f65f05e28cf in slapi_modify_internal_pb (pb=0x7f650005abc0) at ../ds/ldap/servers/slapd/modify.c:454
13 0x00007f65e4e70633 in _do_modify (mod_pb=0x7f650005abc0, entrySDN=0x7f650003b880, mods=0x7f6539ffa1f0)
    at ../ds/ldap/servers/plugins/referint/referint.c:752
14 0x00007f65e4e70752 in _update_one_per_mod (entrySDN=0x7f650003b880, attr=0x7f65000318e0, 
    attrName=0x7f65000239d0 "uniqueMember", origDN=0x7f6500034400, newRDN=0x0, newsuperior=0x0, mod_pb=0x7f650005abc0)
    at ../ds/ldap/servers/plugins/referint/referint.c:793
15 0x00007f65e4e71642 in update_integrity (origSDN=0x7f6500034400, newrDN=0x0, newsuperior=0x0, logChanges=0)
    at ../ds/ldap/servers/plugins/referint/referint.c:1194
16 0x00007f65e4e702ec in referint_postop_del (pb=0x7f6539ffaa80) at ../ds/ldap/servers/plugins/referint/referint.c:641
17 0x00007f65f05fb6dc in plugin_call_func (list=0x1a5d2b0, operation=563, pb=0x7f6539ffaa80, call_one=0)
    at ../ds/ldap/servers/slapd/plugin.c:2065
18 0x00007f65f05fb545 in plugin_call_list (list=0x1d8e510, operation=563, pb=0x7f6539ffaa80)
    at ../ds/ldap/servers/slapd/plugin.c:2007
19 0x00007f65f05f7885 in plugin_call_plugins (pb=0x7f6539ffaa80, whichfunction=563) at ../ds/ldap/servers/slapd/plugin.c:446
20 0x00007f65e33b31c6 in ldbm_back_delete (pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_delete.c:1214
21 0x00007f65f059afc3 in op_shared_delete (pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/delete.c:331
22 0x00007f65f059a7a5 in do_delete (pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/delete.c:97
23 0x0000000000417283 in connection_dispatch_operation (conn=0x1a07b78, op=0x7f65340078c0, pb=0x7f6539ffaa80)
    at ../ds/ldap/servers/slapd/connection.c:608
24 0x0000000000419322 in connection_threadmain () at ../ds/ldap/servers/slapd/connection.c:1757
25 0x00007f65edf375cc in _pt_root () from /lib64/libnspr4.so
26 0x00007f65ed8d65ca in start_thread () from /lib64/libpthread.so.0
27 0x00007f65ed6100ed in clone () from /lib64/libc.so.6

Please investigate.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-12-02 09:55:46

This corrects the server crash. 0001-Ticket-48894-improve-entrywsi-delete.patch

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2016-12-07 11:53:17

commit a3e270e483b52f212c638b1d9de8bf862d2a3f49 Writing objects: 100% (25/25), 21.42 KiB | 0 bytes/s, done. Total 25 (delta 19), reused 0 (delta 0) To ​ssh://git.fedorahosted.org/git/389/ds.git

eb358a5..848bd6b master -> master
389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2017-02-11 23:03:49

Metadata Update from @Firstyear:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-25 15:30:42

We are getting crashes reported by the community related to this change - specifically valueset_array_to_sorted_quick():

Thread 50 "ns-slapd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f95860a4700 (LWP 1274)] 0x00007f9702024f91 in valueset_array_to_sorted_quick (a=0xa7c93897a0, vs=vs@entry=0xa7c93897d8, low=low@entry=0, high=111) at ldap/servers/slapd/valueset.c:993 993 } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

Gathering more information, but unfortunately there is no reproducer at this time.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-25 15:30:43

Metadata Update from @mreynolds389:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-25 20:27:25

@nweiderm - fyi there is an issue with this fix

Here is the full stacktrace:

Thread 1 (Thread 0x7f4961c43700 (LWP 1512)):
0  0x00007f4adcbc1f91 in valueset_array_to_sorted_quick (a=0xae44b5ab40, vs=vs@entry=0xae44b5ab78, low=low@entry=0, high=759) at ldap/servers/slapd/valueset.c:993
        pivot = 0
        i = <optimized out>
        j = 760
1  0x00007f4adcbc2082 in valueset_array_to_sorted (a=a@entry=0xae44b5ab40, vs=vs@entry=0xae44b5ab78) at ldap/servers/slapd/valueset.c:970
        i = <optimized out>
2  0x00007f4adcbc21f9 in valueset_array_purge (a=a@entry=0xae44b5ab40, vs=vs@entry=0xae44b5ab78, csn=csn@entry=0xae4c3b4fd0) at ldap/servers/slapd/valueset.c:771
        i = <optimized out>
        j = <optimized out>
        nextValue = <optimized out>
        numValues = <optimized out>
3  0x00007f4adcbc2288 in valueset_purge (a=a@entry=0xae44b5ab40, vs=vs@entry=0xae44b5ab78, csn=csn@entry=0xae4c3b4fd0) at ldap/servers/slapd/valueset.c:797
        r = 0
4  0x00007f4adcb2e2d2 in attr_purge_state_information (entry=entry@entry=0xae4c4dfd40, attr=attr@entry=0xae44b5ab40, csnUpTo=csnUpTo@entry=0xae4c3b4fd0) at ldap/servers/slapd/attr.c:739
5  0x00007f4adcb4c216 in entry_purge_state_information (e=0xae4c4dfd40, csnUpTo=0xae4c3b4fd0) at ldap/servers/slapd/entrywsi.c:284
        a = 0xae44b5ab40
6  0x00007f4ad05d2b74 in purge_entry_state_information (pb=pb@entry=0xae4cc725a0) at ldap/servers/plugins/replication/repl5_plugins.c:568
        e = 0xae4c4dfd40
        optype = 8
        purge_csn = 0xae4c3b4fd0
        repl_obj = 0xae44bb73e0
        replica = <optimized out>
7  0x00007f4ad05d46c2 in multimaster_bepreop_modify (pb=0xae4cc725a0) at ldap/servers/plugins/replication/repl5_plugins.c:704
        rc = 0
        op = 0xae450bee00
        is_replicated_operation = <optimized out>
        is_fixup_operation = <optimized out>
8  0x00007f4adcb8bfad in plugin_call_mmr_plugin_preop (pb=pb@entry=0xae4cc725a0, e=e@entry=0x0, flags=flags@entry=451) at ldap/servers/slapd/plugin_mmr.c:39
        p = 0xae44aaa840
        rc = 50
        operation = 0xae450bee00
9  0x00007f4ad0ec53f5 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:626
        cache_rc = 0
        new_mod_count = 0
        be = 0xae449932b0
        inst = 0xae449344e0
        li = 0xae448b7180
        e = 0xae4cc737a0
        ec = 0xae4cc73740
        original_entry = 0x0
        tmpentry = 0x0
        postentry = 0x0
        mods = 0xae4c9e4690
        mods_original = 0x0
        smods = {mods = 0x0, num_elements = 0, num_mods = 0, iterator = 0, free_mods = 0}
        txn = {back_txn_txn = 0xae4cc48b00}
        parent_txn = 0x0
        ruv_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, attr_encrypt = 0}
        ruv_c_init = 0
        retval = 0
        msg = <optimized out>
        errbuf = 0x0
        retry_count = 0
        disk_full = 0
        ldap_result_code = 0
        ldap_result_message = 0x0
        rc = 0
        operation = 0xae450bee00
        addr = 0xae450beef0
        is_fixup_operation = 0
        is_ruv = 0
        opcsn = <optimized out>
        repl_op = 8
        opreturn = 0
        mod_count = <optimized out>
        not_an_error = 0
        is_noop = 0
        fixup_tombstone = 0
        ec_locked = 0
        result_sent = 0
10 0x00007f4adcb74add in op_shared_modify (pb=pb@entry=0xae4cc725a0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1013
        rc = 0
        be = 0xae449932b0
        pse = 0x1
        referral = 0x0
        e = 0x0
        dn = 0xae44a66780 "cn=refreshstatus,ou=people,dc=eds,dc=arizona,dc=edu"
        normdn = <optimized out>
        sdn = 0xae4c9e5140
        passin_sdn = 0
        mods = 0xae4c9e4690
        pw_mod = <optimized out>
        tmpmods = 0xae4c9e4690
        smods = {mods = 0x0, num_elements = 0, num_mods = 0, iterator = 0, free_mods = 0}
        repl_op = 8
        internal_op = 0
        lastmod = 1
        skip_modified_attrs = 0
        unhashed_pw_attr = 0x0
        operation = 0xae450bee00
        pb_conn = 0xae4cdf9c00
        errorbuf = "\000C\321\333J\177\000\000\200\020\006D\256\000\000\000\340C\321\333J\177\000\000\b\000\000\000\000\000\000\000@\020\006D\256\000\000\000\267\205\226\333J\177\000\000P4:L\256\000\000\000\240\064:L\256\000\000\000\240\064:L\256\000\000\000\000\063\275\200\313v)v\000'\304aI\177\000\000\000\063\275\200\313v)v\020\000\000\000\000\000\000\000\200\020\006D\256\000\000\000\020\000\000\000\000\000\000\000 \000\000\000\000\000\000\000@\020\006D\256", '\000' <repeats 11 times>, "Xg\246D\256\000\000\000\332\263\226\333J\177\000\000\300O\261L\256\000\000\000\320O\261L\256\000\000\000\260K\261L\256\000\000\000\000\063\275\200\313v)v@\020\006D\256\000\000\000"...
        err = <optimized out>
        lc_mod = <optimized out>
        p = <optimized out>
        proxydn = 0x0
        proxy_err = <optimized out>
        errtext = 0x0
11 0x00007f4adcb75ee2 in do_modify (pb=pb@entry=0xae4cc725a0) at ldap/servers/slapd/modify.c:383
        operation = 0xae450bee00
        pb_conn = 0xae4cdf9c00
        smods = {mods = 0x0, num_elements = 0, num_mods = 0, iterator = 0, free_mods = 0}
        ber = <optimized out>
        tag = <optimized out>
        len = 18446744073709551615
        normalized_mods = 0xae4c9e4690
        mod = 0x0
        mods = 0xae44a66740
        last = 0xae4c94826f "\240\177\060\033\004\027\062.16.840.1.113730.3.4.2\004"
        type = 0x0
        old_pw = 0x0
        rawdn = 0xae44a66780 "cn=refreshstatus,ou=people,dc=eds,dc=arizona,dc=edu"
        minssf_exclude_rootdse = <optimized out>
        ignored_some_mods = <optimized out>
        has_password_mod = <optimized out>
        pw_change = <optimized out>
        err = <optimized out>
12 0x000000ae42dda886 in connection_dispatch_operation (pb=0xae4cc725a0, op=0xae450bee00, conn=0xae4cdf9c00) at ldap/servers/slapd/connection.c:624
        minssf = 0
        minssf_exclude_rootdse = <optimized out>
        enable_nagle = <optimized out>
        pop_cork = 0
        is_timedout = 0
        curtime = <optimized out>
        pb = 0xae4cc725a0
        conn = 0xae4cdf9c00
        op = 0xae450bee00
        tag = 102
        need_wakeup = 0
        thread_turbo_flag = <optimized out>
        ret = <optimized out>
        more_data = 0
        replication_connection = <optimized out>
        doshutdown = 0
        maxthreads = 5
        bypasspollcnt = <optimized out>
        pb_conn = 0xae4cdf9c00
        pb_op = 0xae450bee00
13 0x000000ae42dda886 in connection_threadmain () at ldap/servers/slapd/connection.c:1761
        is_timedout = 0
        curtime = <optimized out>
        pb = 0xae4cc725a0
        conn = 0xae4cdf9c00
        op = 0xae450bee00
        tag = 102
        need_wakeup = 0
        thread_turbo_flag = <optimized out>
        ret = <optimized out>
        more_data = 0
        replication_connection = <optimized out>
        doshutdown = 0
        maxthreads = 5
        bypasspollcnt = <optimized out>
        pb_conn = 0xae4cdf9c00
        pb_op = 0xae450bee00
14 0x00007f4ada74208b in _pt_root () at /lib64/libnspr4.so
15 0x00007f4ada0de36d in start_thread () at /lib64/libpthread.so.0
16 0x00007f4ad9bcdbbf in clone () at /lib64/libc.so.6

This is where we crash;

0x00007f9702024f91 in valueset_array_to_sorted_quick (a=0xa7c93897a0, vs=vs @entry=0xa7c93897d8, low=low @entry=0, high=111) at ldap/servers/slapd/valueset.c:993
993 } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

(gdb) p pivot
$2 = 0
(gdb) p i
$1 = <optimized out
(gdb) p *vs
$6 = {num = 760, max = 1024, sorted = 0xae4d2da000, va = 0xae4d2d8000}
(gdb) p vs->sorted[i]
value has been optimized out
(gdb) p vs->va[vs->sorted[i]]
value has been optimized out
(gdb) p *vs->va
$7 = (struct slapi_value *) 0x0
(gdb) p vs->va[pivot]
$4 = (struct slapi_value *) 0x0
(gdb) p *vs->va[pivot]
Cannot access memory at address 0x0

Looks like we we crash because we dereference a NULL pointer: vs->va[vs->sorted[i]],

Currently there is no reproducer but I'm going to try and work on one.

389-ds-bot commented 4 years ago

Comment from nweiderm at 2017-10-25 21:18:15

@mreynolds389 so, when I was studying the purging algo in 1.3 (in the hope of backporting it to our running version) I caught a bug (and it might be the bug that you're hitting here), that's why I decided to work on that new patch instead.

The bug is in the compact algorithm used in valueset_array_purge function in this commit: https://pagure.io/389-ds-base/c/a3e270e483b52f212c638b1d9de8bf862d2a3f49.

j inside the while loop should start iterating from nv+1 not 0, the bug will trigger when you have the following va:

[a, b, NULL, d, NULL...]

On first iteration nextValue and j will be on index 0, nothing replaced, nextValue increased by 1

[a, b, NULL, d, NULL...]

On second iteration nextValue is on index 1 which is (b) not null, but j is still on index 0, nothing replaced, nextValue increased by 1

[a, b, NULL, d, NULL...]

third iteration nextValue is on index 2 which is null, and j is still on index 0, in this iteration the NULL in index 2 will be replaced with a and a will be nullified, nextValue will be increased by 1 and j will remain on index 0.

[NULL, b, a, d, NULL...]

now nextValue is on index 3 which is not null, nothing replaced, nextValue increased by 1

[NULL, b, a, d, NULL...]

nextValue is on index 4 which is null, j will be increased by 1 to be on index 1, NULL on index 4 will be replaced with b, and b will be nullifed

[NULL, NULL, a, d, b...]

So, in that specific case, we will end up with an invalid va array containing invalid and NULL values inside of it.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-25 21:58:29

Community member is going to test your patch, but they got a better stack trace, we can see that the 1024 limit is hit, not sure it's related or not:

Thread 1 (Thread 0x7f1e7fa12700 (LWP 1837)):
0  0x00007f1ff9a231c6 in valueset_array_to_sorted_quick (a=0x796c8b25a0, vs=0x796c8b25d8, low=0, high=759) at ldap/servers/slapd/valueset.c:993
        pivot = 0
        i = 1024
        j = 760
1  0x00007f1ff9a2312b in valueset_array_to_sorted (a=0x796c8b25a0, vs=0x796c8b25d8) at ldap/servers/slapd/valueset.c:970
        i = 1024
2  0x00007f1ff9a22b5b in valueset_array_purge (a=0x796c8b25a0, vs=0x796c8b25d8, csn=0x796c8de420) at ldap/servers/slapd/valueset.c:771
        i = 761
        j = 1024
        nextValue = 760
        numValues = 760
3  0x00007f1ff9a22ba6 in valueset_purge (a=0x796c8b25a0, vs=0x796c8b25d8, csn=0x796c8de420) at ldap/servers/slapd/valueset.c:797
        r = 0
389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-25 22:14:10

FYI, the 1024 limit I am referring to is the valueset max, so I was wondering if we were running past the max size?

(gdb) p i $5 = 1024

(gdb) p *vs $6 = {num = 760, max = 1024, sorted = 0xae4d2da000, va = 0xae4d2d8000}

(gdb) p vs->va[vs->sorted[i]] Cannot access memory at address 0x44519e10020

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-25 22:19:48

In fact I wonder if we need this change:

diff --git a/ldap/servers/slapd/valueset.c b/ldap/servers/slapd/valueset.c
index dc0360738..5a16b03dc 100644
--- a/ldap/servers/slapd/valueset.c
+++ b/ldap/servers/slapd/valueset.c
@@ -1019,7 +1019,7 @@ valueset_array_to_sorted_quick(const Slapi_Attr *a, Slapi_ValueSet *vs, size_t l
     while (1) {
         do {
             i++;
-        } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);
+        } while (i < vs->max && valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

         do {
             j--;
389-ds-bot commented 4 years ago

Comment from nweiderm at 2017-10-25 23:42:53

In fact I wonder if we need this change: diff --git a/ldap/servers/slapd/valueset.c b/ldap/servers/slapd/valueset.c index dc0360738..5a16b03dc 100644 --- a/ldap/servers/slapd/valueset.c +++ b/ldap/servers/slapd/valueset.c @@ -1019,7 +1019,7 @@ valueset_array_to_sorted_quick(const Slapi_Attr a, Slapi_ValueSet vs, size_t l while (1) { do { i++;

  • } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);
  • } while (i < vs->max && valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

do { j--;

Seems about right, i and j can go out of bound here, maybe you want to add another check for j > 0

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 02:53:32

Customer reported that your second patch seems to have resolved the crash, but lets give it a day or two. If it still crashes we can look into this code change.

389-ds-bot commented 4 years ago

Comment from nweiderm at 2017-10-26 14:16:06

Well, the latest patch resolved his issue because it preserves vs->sorted, and reduces the calls to valueset_array_to_sorted during the purge, so it's less likely to hit that crash case during the purge now, but based on the stack traces and the gdb data, it's a different bug that was causing the crash for him (it's not the one that I described above), your assumption of hitting the 1024 limit seems plausible to me, so under a similar edge case if valueset_array_to_sorted were called by different code path, i can still potentially hit the boundaries and j can potentially underflow, so I would suggest adding i < vs->num and j > 0 checks anyways

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 14:33:06

Well, the latest patch resolved his issue because it preserves vs->sorted, and reduces the calls to valueset_array_to_sorted during the purge, so it's less likely to hit that crash case during the purge now, but based on the stack traces and the gdb data, it's a different bug that was causing the crash for him (it's not the one that I described above), your assumption of hitting the 1024 limit seems plausible to me, so under a similar edge case if valueset_array_to_sorted were called by different code path, i can still potentially hit the boundaries and j can potentially underflow, so I would suggest adding i < vs->num and j > 0 checks anyways

Funny I thought more about this as well I was a planning on adding them. Well we are in agreement then :-)

Can you review this patch:

0001-Ticket-48894-harden-valueset_array_to_sorted_quick-v.patch

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 14:33:08

Metadata Update from @mreynolds389:

389-ds-bot commented 4 years ago

Comment from nweiderm at 2017-10-26 15:02:09

@mreynolds389: yep, we're in agreement. Thinking about the check for j again, I think you're right the check should be j >= 0, but we will need another check in do's block like:

~~do { if (j > 0) j--; } while ( j >= 0 && valueset_value_cmp(a, vs->va[vs->sorted[j]], vs->va[pivot]) > 0 )~~

with only j >= 0 check, when j = 0 the condition will evaluate to true and j will be decremented to -1 ( assuming size_t = unsigned something) so it will underflow and the loop won't break because j will have a value larger than 0 again, right?

scratch that, the loop won't break with that solution too. maybe we just have to break if j == 0 inside the do block, the other option is to have j = high + 2 and then we use while ( j > 0 && valueset_value_cmp(a, vs->va[vs->sorted[j-1]], vs->va[pivot]) > 0 )

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 15:47:00

scratch that, the loop won't break with that solution too. maybe we just have to break if j == 0 inside the do block,

But don't we want to check vs->sorted[0] ?

the other option is to have j = high + 2 and then we use while ( j > 0 && valueset_value_cmp(a, vs->va[vs->sorted[j-1]], vs->va[pivot]) > 0 )

But further down we break if i >= j - this logic would change.

Perhaps in the do block we should do:

j--;
if (j < 0) {
    j = 0;
    break;
}
389-ds-bot commented 4 years ago

Comment from nweiderm at 2017-10-26 15:54:57

when j = 0 inside the do block and before the reduction, then vs->sorted[0] has already been checked the last iteration, in do while the reduction/decrement happens before the condition is checked, right? so in the first option when j = 0 and is about to be decremented to -1 in do block, this means the previous condition was for j = 0, for the second option, when j = 1 the condition will check vs->sorted[j-1] (ie: vs->sorted[0]), and then in the next iteration j will be decremented to 0 and the condition will evaluate to false and break, am I misunderstanding anything?

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 16:05:33

This patch should be correct now. I just moved the j > 0 after we do the compare inside the while.

0001-Ticket-48894-harden-valueset_array_to_sorted_quick-v.patch

389-ds-bot commented 4 years ago

Comment from nweiderm at 2017-10-26 16:13:48

Yep, that looks good to me :)

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 16:29:19

52ba2aba4..2086d052e master -> master

4ac6ade8a..a85e24724 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

21112768b..4248346f1 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 16:29:25

Metadata Update from @mreynolds389: