Closed 389-ds-bot closed 4 years ago
Comment from lkrispen (@elkris) at 2014-10-20 20:51:30
I think for postop_add and postop_del this check coould be done when memberof_oktodo() is called
Comment from nhosoi (@nhosoi) at 2014-11-11 04:20:43
Ludwig, setting the next 1.3.2 to the milestone. Please feel free to adjust it to your plan. Thanks!
Comment from nhosoi (@nhosoi) at 2015-03-03 03:10:28
Is this still an issue in the IPA upgrade (from which version to which)?
If so, could you put the 389-ds-base version and the stacktraces with the matched debuginfo (there are some functions which options and lines are missing...)?
Comment from nhosoi (@nhosoi) at 2015-03-07 00:17:55
Setting to FUTURE.
Comment by Ludwig: global lock and lmdb redesign should handle this
Comment from nhosoi (@nhosoi) at 2015-07-24 02:51:51
Version: 389-ds-base-1.3.3.1-15.el7_1.x86_64
Comment from nhosoi (@nhosoi) at 2015-07-27 23:20:52
Replying to [comment:10 mreynolds389]: Thanks to Mark for picking up this ticket. Setting Milestone to 1.3.4.3...
Comment from mreynolds (@mreynolds389) at 2015-07-31 23:09:23
attachment 0001-Ticket-47931-memberOf-retrocl-deadlocks.patch
Comment from nhosoi (@nhosoi) at 2015-08-01 00:22:18
Hi Mark!
I'm reviewing your fix. First, I'm trying to understand the "fix description"... The deadlock among plug-ins is always hard to solve, isn't it? ;)
Bug Description: When concurrently updating multiple backends the
memberOf and retrocl plugins can deadlock on each
other. This is caused by the required retrocl lock,
and the db lock on the changenumber index in the
retrocl db.
Fix Description: Added scoping to the retrocl that allows subtrees/suffixes
to be included or excluded. Also moved the existing
memberOf scoping outside of its global lock.
So, the deadlock was caused by the memberof scoping inside the global lock and retrocl lock (retrocl_internal_lock or retrocl_cn_lock?)? And basically just moving the memberof scoping out solves the deadlock?
Comment from mreynolds (@mreynolds389) at 2015-08-02 18:57:07
Replying to [comment:14 nhosoi]:
Hi Mark!
I'm reviewing your fix. First, I'm trying to understand the "fix description"... The deadlock among plug-ins is always hard to solve, isn't it? ;)
It is, and this fix is almost a workaround. The retrocl design makes it very difficult to avoid deadlocks when dealing with multiple backends.
Bug Description: When concurrently updating multiple backends the memberOf and retrocl plugins can deadlock on each other. This is caused by the required retrocl lock, and the db lock on the changenumber index in the retrocl db. Fix Description: Added scoping to the retrocl that allows subtrees/suffixes to be included or excluded. Also moved the existing memberOf scoping outside of its global lock.
So, the deadlock was caused by the memberof scoping inside the global lock and retrocl lock (retrocl_internal_lock or retrocl_cn_lock?)? And basically just moving the memberof scoping out solves the deadlock?
There are two different deadlocks this patch tries to address. The first one is very easy to reproduce, the other I can not reproduce. Here is the stack from the first deadloack that is easy to reproduce:
Thread 14
2 0x00007fc6b38420b9 in PR_Lock () from /lib64/libnspr4.so
3 0x00007fc6a7ec99b0 in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
4 0x00007fc6b546156f in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
5 0x00007fc6b5461893 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
6 0x00007fc6a965d231 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
7 0x00007fc6b544e905 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
8 0x00007fc6b544f387 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
9 0x00007fc6a8d354a2 in memberof_fix_memberof_callback () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
10 0x00007fc6a8d35f65 in memberof_modop_one_replace_r () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
11 0x00007fc6a8d362a6 in memberof_mod_smod_list () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
12 0x00007fc6a8d3758d in memberof_postop_modify () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
13 0x00007fc6b546156f in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
14 0x00007fc6b5461893 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
15 0x00007fc6a965d231 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
16 0x00007fc6b544e905 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
17 0x00007fc6b544fbe7 in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
Thread 13 (Thread 0x7fc678ff1700 (LWP 1931)):
0 0x00007fc6b31ed590 inpthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
1 0x00007fc6adc811fb in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
2 0x00007fc6adc8059b in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
3 0x00007fc6add2aa91 in __lock_get_internal () from /lib64/libdb-5.3.so
4 0x00007fc6add2b657 in __lock_get () from /lib64/libdb-5.3.so
5 0x00007fc6add576af in __db_lget () from /lib64/libdb-5.3.so
6 0x00007fc6adc9e4f0 in __bam_search () from /lib64/libdb-5.3.so
7 0x00007fc6adc89144 in __bamc_search () from /lib64/libdb-5.3.so
8 0x00007fc6adc8b074 in __bamc_get () from /lib64/libdb-5.3.so
9 0x00007fc6add43ba3 in __dbc_iget () from /lib64/libdb-5.3.so
10 0x00007fc6add53092 in __dbc_get_pp () from /lib64/libdb-5.3.so
11 0x00007fc6a9672b70 in ldbm_back_seq () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
12 0x00007fc6b54650f9 in seq_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0
13 0x00007fc6b54652a9 in slapi_seq_callback () from /usr/lib64/dirsrv/libslapd.so.0
14 0x00007fc6a7ec8919 in retrocl_update_lastchangenumber () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
15 0x00007fc6a7ec89bb in retrocl_assign_changenumber () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
16 0x00007fc6a7ec99b5 in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
17 0x00007fc6b546156f in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
18 0x00007fc6b5461893 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
19 0x00007fc6a965d231 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
20 0x00007fc6b544e905 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
21 0x00007fc6b544fbe7 in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
Thread 14 is updating many members in one mod operation, so the retrocl is being called multiple times in the modify operation. It holds a lock on the changennumber index(via retrocl plugin). Then there are updates occuring on a different backend (thread 13) that grab the retrocl lock, but then gets stuck waiting on the changenumber index which thread 13 holds. But then thread 14, which has the changenumber lock, tries to grab the retrocl lock again - and we deadlock between the two threads.
So by adding scoping to retrocl, where we exclude the other backends, we avoid this deadlock.
The second deadlock, the stack is listed in this ticket, I can not reproduce. But... By moving the scoping check outside of the memberOf lock, and combined with the new retrocl scoping, should address this deadlock as well.
Really, the problem is the retro changelog design. The retrocl plugin almost needs to be either rewritten/redesigned entirely, or simply removed and the replication changelog extended to fulfill the needs that the retrocl was providing.
So for now, this patch addresses the issues that current customers are running into with IPA, and it also adds some new functionality/flexibility to the retrocl plugin.
Comment from nhosoi (@nhosoi) at 2015-08-04 01:17:38
Thank you for your explanation, Mark! Now I understood the design of your fix.
2 minor issues...
645 /* Check if the the entry include scope is a child of the sdn */
646 static Slapi_DN*
647 memberof_scope_is_child_of_dn(MemberOfConfig *config, Slapi_DN *sdn)
648 {
649 int i = 0;
650
651 while(config->entryScopes && config->entryScopes[i]){
652 if(slapi_sdn_issuffix(config->entryScopes[i], sdn)){
653 return config->entryScopes[i];
654 }
655 }
656 return NULL;
657 }
48 static Slapi_DN **retrocl_includes = NULL;
49 static Slapi_DN **retrocl_excludes = NULL;
391 retrocl_excludes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1);
411 retrocl_includes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1);
Comment from mreynolds (@mreynolds389) at 2015-08-04 02:47:29
Replying to [comment:16 nhosoi]:
Thank you for your explanation, Mark! Now I understood the design of your fix.
2 minor issues...
- Is 'i' supposed to get incremented?
645 /* Check if the the entry include scope is a child of the sdn */ 646 static Slapi_DN* 647 memberof_scope_is_child_of_dn(MemberOfConfig *config, Slapi_DN *sdn) 648 { 649 int i = 0; 650 651 while(config->entryScopes && config->entryScopes[i]){ 652 if(slapi_sdn_issuffix(config->entryScopes[i], sdn)){ 653 return config->entryScopes[i]; 654 } 655 } 656 return NULL; 657 }
Good catch
- Is retro changelog applicable to the dynamic plug-in? If yes, retrocl_includes|excludes could leak?
48 static Slapi_DN **retrocl_includes = NULL; 49 static Slapi_DN **retrocl_excludes = NULL; 391 retrocl_excludes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1); 411 retrocl_includes = (Slapi_DN **)slapi_ch_calloc(sizeof(Slapi_DN *),num_vals+1);
This is freed in the plugin stop function, and that's how dynamic plugins work. Any change to plugin's config triggers the plugin to be restarted. So this is not leaked.
Comment from mreynolds (@mreynolds389) at 2015-08-04 22:19:49
revision 0001-Ticket-47931-memberOf-retrocl-deadlocks.2.patch
Comment from mreynolds (@mreynolds389) at 2015-08-04 22:20:11
New patch attached...
Comment from nhosoi (@nhosoi) at 2015-08-06 22:45:08
Hi Mark, thanks for waiting. Please push it!
Comment from mreynolds (@mreynolds389) at 2015-08-07 18:37:43
6b138a2..fd959ac master -> master commit fd959ac864d6d86d24928bc2c6f097d1a6031ecd Author: Mark Reynolds mreynolds389@redhat.com Date: Tue Aug 4 12:19:31 2015 -0400
e62b481..d810847 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit d8108476d3bedbcc03f6c61bfb3d50e921faaf42
Comment from mreynolds (@mreynolds389) at 2015-08-10 20:46:37
Coverity issues found, reopening...
Comment from mreynolds (@mreynolds389) at 2015-08-10 20:47:26
Fix coverity issues 0002-Ticket-47931-Fix-coverity-issues.patch
Comment from mreynolds (@mreynolds389) at 2015-08-10 21:18:57
fd959ac..5daea97 master -> master commit 5daea973e4526584ee41d7b9f4b1b4993b4de6f1 Author: Mark Reynolds mreynolds389@redhat.com Date: Mon Aug 10 10:42:40 2015 -0400
d810847..9a0047e 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 9a0047ef75f6dbeb1980ac77fab5d62865c77e6a
Comment from mreynolds (@mreynolds389) at 2015-08-11 01:51:42
08eabd5..f86dab3 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit f86dab3bc94482df4f6850ea6a9a90371c1cfc03
f86dab3..e6343e6 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit e6343e66aa2e4537388b386c055b7ca8044f8f34
Comment from mreynolds (@mreynolds389) at 2015-08-19 20:10:44
Debug logging was accidentally left in the source - it has now been removed:
29c09a5..db7153f master -> master commit db7153f89bf3dda935e6ef4f175697bda32fe720 Author: Mark Reynolds mreynolds389@redhat.com Date: Wed Aug 19 10:03:50 2015 -0400
2c5e0d5..1781280 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 1781280f133c4877f83949400294641a558f5406
895dc4f..5c48f28 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 5c48f28d2d6c46aa3cd2883207c18002d01e4994
Comment from mreynolds (@mreynolds389) at 2015-08-29 01:03:33
lib389 test script 0001-Ticket-47931-lib389-test-memberof-and-retrocl-deadlo.patch
Comment from nhosoi (@nhosoi) at 2015-08-29 01:23:16
Replying to [comment:27 mreynolds389]: Probably, a silly question... :p Could you tell me where test_ticket47931 is called?
Comment from mreynolds (@mreynolds389) at 2015-08-29 01:36:55
Replying to [comment:28 nhosoi]:
Replying to [comment:27 mreynolds389]: Probably, a silly question... :p Could you tell me where test_ticket47931 is called?
Simon modified create-ticket.py to make new ticket tests more compliant with the pytest way. So now in the main function we call
pytest.main("-s %s" % CURRENT_FILE)
Which executes all the functions that start with "test".
Comment from nhosoi (@nhosoi) at 2015-08-29 01:38:00
I see. Thanks!
Comment from mreynolds (@mreynolds389) at 2015-08-29 04:56:50
Thanks Noriko!
4277f2b..48470e7 master -> master commit 48470e7a6b0a350149ced0a1377e625b836dba64 Author: Mark Reynolds mreynolds389@redhat.com Date: Fri Aug 28 15:01:58 2015 -0400
8c9c2fc..22e4de7 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 22e4de79107e49af74e0ff7615552b61db240b71
f10e2e6..e7d3b2f 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit e7d3b2f120e0a3e9ce642733b803686fbcc49222
Comment from mreynolds (@mreynolds389) at 2017-02-11 22:52:46
Metadata Update from @mreynolds389:
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47931
In IPA upgrade this deadlock was encountered:
There is a scope for memberof defined:
which should exclude cn=changelog, but the check is only done after acquiring the memberof lock.