389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 91 forks source link

deadlock with memberof plugin and retrocl #1262

Closed 389-ds-bot closed 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47931


In IPA upgrade this deadlock was encountered:

Thread 4 (Thread 0x7f27f97e2700 (LWP 11679)):
0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
1  0x00007f282fa1a1af in _L_lock_1026 () from /lib64/libpthread.so.0
2  0x00007f282fa1a151 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f2833f46d18) at ../nptl/pthread_mutex_lock.c:134
3  0x00007f28300722c9 in PR_Lock (lock=lock@entry=0x7f2833f46d18) at ../../../nspr/pr/src/pthreads/ptsynch.c:174
4  0x00007f2830072883 in PR_EnterMonitor (mon=0x7f2833f46d10) at ../../../nspr/pr/src/pthreads/ptsynch.c:529
5  0x00007f282612bff6 in dblayer_txn_begin () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
6  0x00007f282614bba3 in ldbm_back_add () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
7  0x00007f2831c4ec1a in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0
8  0x00007f2831c4f4a3 in add_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
9  0x00007f28249daf0d in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
10 0x00007f2831ca7cb0 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
11 0x00007f2831ca7f08 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
12 0x00007f2826166b91 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
13 0x00007f2831c94fa1 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
14 0x00007f2831c95a54 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
15 0x00007f28258446a1 in memberof_fix_memberof_callback () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
16 0x00007f2831cbc3cd in send_ldap_search_entry_ext () from /usr/lib64/dirsrv/libslapd.so.0
17 0x00007f2831cbcc0c in send_ldap_search_entry () from /usr/lib64/dirsrv/libslapd.so.0
18 0x00007f2831c9a593 in iterate.isra.0.constprop.3 () from /usr/lib64/dirsrv/libslapd.so.0
19 0x00007f2831c9a73a in send_results_ext.constprop.2 () from /usr/lib64/dirsrv/libslapd.so.0
20 0x00007f2831c9be21 in op_shared_search () from /usr/lib64/dirsrv/libslapd.so.0
21 0x00007f2831cabc8e in search_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0
22 0x00007f2825845ef7 in memberof_fixup_task_thread () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
23 0x00007f2830077c10 in _pt_root (arg=0x7f28140485e0) at ../../../nspr/pr/src/pthreads/ptthread.c:204
24 0x00007f282fa17f33 in start_thread (arg=0x7f27f97e2700) at pthread_create.c:309
25 0x00007f282f745ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 36 (Thread 0x7f283204f700 (LWP 11643)):
0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
1  0x00007f282fa1a1af in _L_lock_1026 () from /lib64/libpthread.so.0
2  0x00007f282fa1a151 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f2833f9fd38) at ../nptl/pthread_mutex_lock.c:134
3  0x00007f28300722c9 in PR_Lock (lock=lock@entry=0x7f2833f9fd38) at ../../../nspr/pr/src/pthreads/ptsynch.c:174
4  0x00007f2830072883 in PR_EnterMonitor (mon=0x7f2833f9fd30) at ../../../nspr/pr/src/pthreads/ptsynch.c:529
5  0x00007f2825846eca in memberof_postop_del () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
6  0x00007f2831ca7cb0 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
7  0x00007f2831ca7f08 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
8  0x00007f28261594de in ldbm_back_delete () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
9  0x00007f2831c5beb0 in op_shared_delete () from /usr/lib64/dirsrv/libslapd.so.0
10 0x00007f2831c5c062 in delete_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
11 0x00007f28249db6de in delete_changerecord () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
12 0x00007f28249db9b1 in changelog_trim_thread_fn () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
13 0x00007f2830077c10 in _pt_root (arg=0x7f28100008e0) at ../../../nspr/pr/src/pthreads/ptthread.c:204
14 0x00007f282fa17f33 in start_thread (arg=0x7f283204f700) at pthread_create.c:309
15 0x00007f282f745ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

There is a scope for memberof defined:

memberofentryscope: dc=ipa,dc=dev,dc=lan

which should exclude cn=changelog, but the check is only done after acquiring the memberof lock.

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2014-10-20 20:51:30

I think for postop_add and postop_del this check coould be done when memberof_oktodo() is called

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2014-11-11 04:20:43

Ludwig, setting the next 1.3.2 to the milestone. Please feel free to adjust it to your plan. Thanks!

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-03-03 03:10:28

Is this still an issue in the IPA upgrade (from which version to which)?

If so, could you put the 389-ds-base version and the stacktraces with the matched debuginfo (there are some functions which options and lines are missing...)?

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-03-07 00:17:55

Setting to FUTURE.

Comment by Ludwig: global lock and lmdb redesign should handle this

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-07-24 02:51:51

Version: 389-ds-base-1.3.3.1-15.el7_1.x86_64

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-07-27 23:20:52

Replying to [comment:10 mreynolds389]: Thanks to Mark for picking up this ticket. Setting Milestone to 1.3.4.3...

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-07-31 23:09:23

attachment 0001-Ticket-47931-memberOf-retrocl-deadlocks.patch

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-08-01 00:22:18

Hi Mark!

I'm reviewing your fix. First, I'm trying to understand the "fix description"... The deadlock among plug-ins is always hard to solve, isn't it? ;)

Bug Description:  When concurrently updating multiple backends the
                  memberOf and retrocl plugins can deadlock on each
                  other.  This is caused by the required retrocl lock,
                  and the db lock on the changenumber index in the
                  retrocl db.

Fix Description:  Added scoping to the retrocl that allows subtrees/suffixes
                  to be included or excluded.  Also moved the existing
                  memberOf scoping outside of its global lock.

So, the deadlock was caused by the memberof scoping inside the global lock and retrocl lock (retrocl_internal_lock or retrocl_cn_lock?)? And basically just moving the memberof scoping out solves the deadlock?

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-02 18:57:07

Replying to [comment:14 nhosoi]:

Hi Mark!

I'm reviewing your fix. First, I'm trying to understand the "fix description"... The deadlock among plug-ins is always hard to solve, isn't it? ;)

It is, and this fix is almost a workaround. The retrocl design makes it very difficult to avoid deadlocks when dealing with multiple backends.

Bug Description:  When concurrently updating multiple backends the
                  memberOf and retrocl plugins can deadlock on each
                  other.  This is caused by the required retrocl lock,
                  and the db lock on the changenumber index in the
                  retrocl db.

Fix Description:  Added scoping to the retrocl that allows subtrees/suffixes
                  to be included or excluded.  Also moved the existing
                  memberOf scoping outside of its global lock.

So, the deadlock was caused by the memberof scoping inside the global lock and retrocl lock (retrocl_internal_lock or retrocl_cn_lock?)? And basically just moving the memberof scoping out solves the deadlock?

There are two different deadlocks this patch tries to address. The first one is very easy to reproduce, the other I can not reproduce. Here is the stack from the first deadloack that is easy to reproduce:

Thread 14

2  0x00007fc6b38420b9 in PR_Lock () from /lib64/libnspr4.so
3  0x00007fc6a7ec99b0 in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
4  0x00007fc6b546156f in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
5  0x00007fc6b5461893 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
6  0x00007fc6a965d231 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
7  0x00007fc6b544e905 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
8  0x00007fc6b544f387 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
9  0x00007fc6a8d354a2 in memberof_fix_memberof_callback () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
10 0x00007fc6a8d35f65 in memberof_modop_one_replace_r () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
11 0x00007fc6a8d362a6 in memberof_mod_smod_list () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
12 0x00007fc6a8d3758d in memberof_postop_modify () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
13 0x00007fc6b546156f in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
14 0x00007fc6b5461893 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
15 0x00007fc6a965d231 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
16 0x00007fc6b544e905 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
17 0x00007fc6b544fbe7 in do_modify () from /usr/lib64/dirsrv/libslapd.so.0

Thread 13 (Thread 0x7fc678ff1700 (LWP 1931)):

0  0x00007fc6b31ed590 inpthread_cond_wait@@GLIBC_2.3.2  () from /lib64/libpthread.so.0
1  0x00007fc6adc811fb in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
2  0x00007fc6adc8059b in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
3  0x00007fc6add2aa91 in __lock_get_internal () from /lib64/libdb-5.3.so
4  0x00007fc6add2b657 in __lock_get () from /lib64/libdb-5.3.so
5  0x00007fc6add576af in __db_lget () from /lib64/libdb-5.3.so
6  0x00007fc6adc9e4f0 in __bam_search () from /lib64/libdb-5.3.so
7  0x00007fc6adc89144 in __bamc_search () from /lib64/libdb-5.3.so
8  0x00007fc6adc8b074 in __bamc_get () from /lib64/libdb-5.3.so
9  0x00007fc6add43ba3 in __dbc_iget () from /lib64/libdb-5.3.so
10 0x00007fc6add53092 in __dbc_get_pp () from /lib64/libdb-5.3.so
11 0x00007fc6a9672b70 in ldbm_back_seq () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
12 0x00007fc6b54650f9 in seq_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0
13 0x00007fc6b54652a9 in slapi_seq_callback () from /usr/lib64/dirsrv/libslapd.so.0
14 0x00007fc6a7ec8919 in retrocl_update_lastchangenumber () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
15 0x00007fc6a7ec89bb in retrocl_assign_changenumber () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
16 0x00007fc6a7ec99b5 in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
17 0x00007fc6b546156f in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
18 0x00007fc6b5461893 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
19 0x00007fc6a965d231 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
20 0x00007fc6b544e905 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
21 0x00007fc6b544fbe7 in do_modify () from /usr/lib64/dirsrv/libslapd.so.0

Thread 14 is updating many members in one mod operation, so the retrocl is being called multiple times in the modify operation. It holds a lock on the changennumber index(via retrocl plugin). Then there are updates occuring on a different backend (thread 13) that grab the retrocl lock, but then gets stuck waiting on the changenumber index which thread 13 holds. But then thread 14, which has the changenumber lock, tries to grab the retrocl lock again - and we deadlock between the two threads.

So by adding scoping to retrocl, where we exclude the other backends, we avoid this deadlock.

The second deadlock, the stack is listed in this ticket, I can not reproduce. But... By moving the scoping check outside of the memberOf lock, and combined with the new retrocl scoping, should address this deadlock as well.

Really, the problem is the retro changelog design. The retrocl plugin almost needs to be either rewritten/redesigned entirely, or simply removed and the replication changelog extended to fulfill the needs that the retrocl was providing.

So for now, this patch addresses the issues that current customers are running into with IPA, and it also adds some new functionality/flexibility to the retrocl plugin.

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-08-04 01:17:38

Thank you for your explanation, Mark! Now I understood the design of your fix.

2 minor issues...

  1. Is 'i' supposed to get incremented?
    645 /* Check if the the entry include scope is a child of the sdn */ 
    646 static Slapi_DN* 
    647 memberof_scope_is_child_of_dn(MemberOfConfig *config, Slapi_DN *sdn) 
    648 { 
    649         int i = 0; 
    650  
    651         while(config->entryScopes && config->entryScopes[i]){ 
    652                 if(slapi_sdn_issuffix(config->entryScopes[i], sdn)){ 
    653                         return config->entryScopes[i]; 
    654                 } 
    655         } 
    656         return NULL; 
    657 }
  2. Is retro changelog applicable to the dynamic plug-in? If yes, retrocl_includes|excludes could leak?
    48  static Slapi_DN **retrocl_includes = NULL; 
    49  static Slapi_DN **retrocl_excludes = NULL; 
    391         retrocl_excludes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1); 
    411         retrocl_includes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1);
389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-04 02:47:29

Replying to [comment:16 nhosoi]:

Thank you for your explanation, Mark! Now I understood the design of your fix.

2 minor issues...

  1. Is 'i' supposed to get incremented?
    645 /* Check if the the entry include scope is a child of the sdn */ 
    646 static Slapi_DN* 
    647 memberof_scope_is_child_of_dn(MemberOfConfig *config, Slapi_DN *sdn) 
    648 { 
    649         int i = 0; 
    650  
    651         while(config->entryScopes && config->entryScopes[i]){ 
    652                 if(slapi_sdn_issuffix(config->entryScopes[i], sdn)){ 
    653                         return config->entryScopes[i]; 
    654                 } 
    655         } 
    656         return NULL; 
    657 }

Good catch

  1. Is retro changelog applicable to the dynamic plug-in? If yes, retrocl_includes|excludes could leak?
    48  static Slapi_DN **retrocl_includes = NULL; 
    49  static Slapi_DN **retrocl_excludes = NULL; 
    391         retrocl_excludes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1); 
    411         retrocl_includes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1);

This is freed in the plugin stop function, and that's how dynamic plugins work. Any change to plugin's config triggers the plugin to be restarted. So this is not leaked.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-04 22:19:49

revision 0001-Ticket-47931-memberOf-retrocl-deadlocks.2.patch

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-04 22:20:11

New patch attached...

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-08-06 22:45:08

Hi Mark, thanks for waiting. Please push it!

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-07 18:37:43

6b138a2..fd959ac master -> master commit fd959ac864d6d86d24928bc2c6f097d1a6031ecd Author: Mark Reynolds mreynolds389@redhat.com Date: Tue Aug 4 12:19:31 2015 -0400

e62b481..d810847 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit d8108476d3bedbcc03f6c61bfb3d50e921faaf42

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-10 20:46:37

Coverity issues found, reopening...

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-10 20:47:26

Fix coverity issues 0002-Ticket-47931-Fix-coverity-issues.patch

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-10 21:18:57

fd959ac..5daea97 master -> master commit 5daea973e4526584ee41d7b9f4b1b4993b4de6f1 Author: Mark Reynolds mreynolds389@redhat.com Date: Mon Aug 10 10:42:40 2015 -0400

d810847..9a0047e 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 9a0047ef75f6dbeb1980ac77fab5d62865c77e6a

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-11 01:51:42

08eabd5..f86dab3 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit f86dab3bc94482df4f6850ea6a9a90371c1cfc03

f86dab3..e6343e6 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit e6343e66aa2e4537388b386c055b7ca8044f8f34

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-19 20:10:44

Debug logging was accidentally left in the source - it has now been removed:

29c09a5..db7153f master -> master commit db7153f89bf3dda935e6ef4f175697bda32fe720 Author: Mark Reynolds mreynolds389@redhat.com Date: Wed Aug 19 10:03:50 2015 -0400

2c5e0d5..1781280 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 1781280f133c4877f83949400294641a558f5406

895dc4f..5c48f28 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 5c48f28d2d6c46aa3cd2883207c18002d01e4994

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-29 01:03:33

lib389 test script 0001-Ticket-47931-lib389-test-memberof-and-retrocl-deadlo.patch

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-08-29 01:23:16

Replying to [comment:27 mreynolds389]: Probably, a silly question... :p Could you tell me where test_ticket47931 is called?

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-29 01:36:55

Replying to [comment:28 nhosoi]:

Replying to [comment:27 mreynolds389]: Probably, a silly question... :p Could you tell me where test_ticket47931 is called?

Simon modified create-ticket.py to make new ticket tests more compliant with the pytest way. So now in the main function we call

pytest.main("-s %s" % CURRENT_FILE)

Which executes all the functions that start with "test".

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-08-29 01:38:00

I see. Thanks!

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2015-08-29 04:56:50

Thanks Noriko!

4277f2b..48470e7 master -> master commit 48470e7a6b0a350149ced0a1377e625b836dba64 Author: Mark Reynolds mreynolds389@redhat.com Date: Fri Aug 28 15:01:58 2015 -0400

8c9c2fc..22e4de7 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 22e4de79107e49af74e0ff7615552b61db240b71

f10e2e6..e7d3b2f 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit e7d3b2f120e0a3e9ce642733b803686fbcc49222

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-02-11 22:52:46

Metadata Update from @mreynolds389: