Closed 389-ds-bot closed 4 years ago
Comment from nkinder (@nkinder) at 2013-11-16 05:24:11
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1031227
Comment from rmeggins (@richm) at 2013-11-18 23:45:24
I thought PRUint64 operations were not atomic? I think it would be better to use the slapi_counter API - that way the slapi_counter API will take care of locking
Comment from mreynolds (@mreynolds389) at 2013-11-19 00:00:22
Replying to [comment:6 richm]:
I thought PRUint64 operations were not atomic?
I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look). I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?
Comment from rmeggins (@richm) at 2013-11-19 00:22:28
Replying to [comment:7 mreynolds389]:
Replying to [comment:6 richm]:
I thought PRUint64 operations were not atomic?
I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look).
/* * slapi_counter_add() * * Atomically add a value to a Slapi_Counter. */ PRUint64 slapi_counter_add(Slapi_Counter *counter, PRUint64 addvalue) { PRUint64 newvalue = 0; ... #ifdef LINUX newvalue = __sync_add_and_fetch(&(counter->value), addvalue); ... return newvalue;
So no, PRUint64 newvalue++ is not atomic.
I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?
Comment from mreynolds (@mreynolds389) at 2013-11-19 00:52:31
Replying to [comment:8 richm]:
Replying to [comment:7 mreynolds389]:
Replying to [comment:6 richm]:
I thought PRUint64 operations were not atomic?
I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look).
/* * slapi_counter_add() * * Atomically add a value to a Slapi_Counter. */ PRUint64 slapi_counter_add(Slapi_Counter *counter, PRUint64 addvalue) { PRUint64 newvalue = 0; ... #ifdef LINUX newvalue = __sync_add_and_fetch(&(counter->value), addvalue); ... return newvalue;
So no, PRUint64 newvalue++ is not atomic.
Yeah, you're absolutely right. I really thought a PRUint64 was atomic, it comes up all the time when there's talk about lock-free access(hazard pointers, etc). I obviously misunderstood some of it. Anyway I'm reworking the patch.
I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?
Comment from mreynolds (@mreynolds389) at 2013-11-19 01:44:50
Replying to [comment:9 mreynolds389]:
Replying to [comment:8 richm]:
Replying to [comment:7 mreynolds389]:
Replying to [comment:6 richm]:
I thought PRUint64 operations were not atomic?
I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look).
/* * slapi_counter_add() * * Atomically add a value to a Slapi_Counter. */ PRUint64 slapi_counter_add(Slapi_Counter *counter, PRUint64 addvalue) { PRUint64 newvalue = 0; ... #ifdef LINUX newvalue = __sync_add_and_fetch(&(counter->value), addvalue); ... return newvalue;
So no, PRUint64 newvalue++ is not atomic.
Yeah, you're absolutely right. I really thought a PRUint64 was atomic, it comes up all the time when there's talk about lock-free access(hazard pointers, etc). I obviously misunderstood some of it. Anyway I'm reworking the patch.
I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?
New patch is attached.
Thanks, Mark
Comment from rmeggins (@richm) at 2013-11-19 23:00:17
With slapi_counter, operations on single counters are atomic. However, operations on multiple counters are not.
void retrocl_commit_changenumber(void)
{
if ( slapi_counter_get_value(retrocl_first_cn) == 0) {
slapi_counter_set_value(retrocl_first_cn, slapi_counter_get_value(retrocl_internal_cn));
}
}
This is not atomic, because the value of retrocl_first_cn could change after testing it for 0.
Same with this:
if(slapi_counter_get_value(retrocl_internal_cn) <= slapi_counter_get_value(retrocl_first_cn)){
The value of retrocl_internal_cn could change before doing slapi_counter_get_value(retrocl_first_cn) and the <= test.
Comment from mreynolds (@mreynolds389) at 2013-11-20 02:15:16
Replying to [comment:11 richm]:
With slapi_counter, operations on single counters are atomic. However, operations on multiple counters are not.
void retrocl_commit_changenumber(void) { if ( slapi_counter_get_value(retrocl_first_cn) == 0) { slapi_counter_set_value(retrocl_first_cn, slapi_counter_get_value(retrocl_internal_cn)); } }
This is not atomic, because the value of retrocl_first_cn could change after testing it for 0.
Same with this:
if(slapi_counter_get_value(retrocl_internal_cn) <= slapi_counter_get_value(retrocl_first_cn)){
The value of retrocl_internal_cn could change before doing slapi_counter_get_value(retrocl_first_cn) and the <= test.
Hmm, I see. Looks like we need to use a RW lock instead of using the slapi counter approach. I'm not sure how else to do it.
Comment from rmeggins (@richm) at 2013-11-20 06:30:38
This adds additional locking to retrocl_assign_changenumber(). The only place where retrocl_assign_changenumber is called is from write_replog_db(), inside of the PR_Lock(retrocl_internal_lock). I'm not sure if this will solve the ipa deadlock, and not introduce any additional deadlocks.
Comment from tbordaz (@tbordaz) at 2013-11-20 15:31:33
Hi,
If this ticket is for reducing a critical section but not the hang, please skip the following.
Just a dummy remark regarding the hang (https://fedorahosted.org/freeipa/attachment/ticket/3967/ds.backtrace). It looks to me that the thread 28 (ssl handshake) is waiting for a lock aquired by thread 15 (applying a IPA set password in the retrocl). Thread 15 is waiting for a db lock (used by a cursor). I did not see any others threads in db. One possibility of the hang could be an aborted task (export/import/index..).
Comment from mreynolds (@mreynolds389) at 2013-11-20 20:18:27
Replying to [comment:14 tbordaz]:
Hi,
If this ticket is for reducing a critical section but not the hang, please skip the following.
Just a dummy remark regarding the hang (https://fedorahosted.org/freeipa/attachment/ticket/3967/ds.backtrace). It looks to me that the thread 28 (ssl handshake) is waiting for a lock aquired by thread 15 (applying a IPA set password in the retrocl). Thread 15 is waiting for a db lock (used by a cursor). I did not see any others threads in db. One possibility of the hang could be an aborted task (export/import/index..).
This is a definitely a possibility, but we don't have any DS logs to see if any tasks were issued/aborted. So for now this patch is just to improve the locking. Once all of the retrocl patches are done, another round of testing will be done, and we'll go from there.
Comment from mreynolds (@mreynolds389) at 2013-11-20 20:18:53
RW locks - part 2 0001-Ticket-47599-Reduce-lock-scope-in-retro-changelog-pl.patch
Comment from mreynolds (@mreynolds389) at 2013-11-20 20:26:47
Replying to [comment:13 richm]:
This adds additional locking to retrocl_assign_changenumber(). The only place where retrocl_assign_changenumber is called is from write_replog_db(), inside of the PR_Lock(retrocl_internal_lock). I'm not sure if this will solve the ipa deadlock, and not introduce any additional deadlocks.
Reworked the RW locking around retrocl_assign_changenumber() && retrocl_update_lastchangenumber(). Now I release the write lock before calling ldbm_back_seq(), and lock it after returning. It is safe to release the lock without fear of the CN changing, because we are now inside the the big internal lock, so everything(CN updates) are serialized at this stage.
So now the RW lock is only held while making an update to the changenumber, nothing else. There should not be any risk of deadlock now.
New patch attached.
Comment from mreynolds (@mreynolds389) at 2013-11-20 22:21:58
git merge retro Updating 2971ca1..e2c42bc Fast-forward ldap/servers/plugins/retrocl/retrocl.c | 3 +- ldap/servers/plugins/retrocl/retrocl.h | 1 + ldap/servers/plugins/retrocl/retrocl_cn.c | 42 ++++++++++++++++++++--------- ldap/servers/plugins/retrocl/retrocl_po.c | 2 +- 4 files changed, 33 insertions(+), 15 deletions(-)
git push origin master Counting objects: 19, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (10/10), 1.37 KiB, done. Total 10 (delta 8), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 2971ca1..e2c42bc master -> master
commit e2c42bced86bac235ac56ae98eed303f61ebd15e Author: Mark Reynolds mreynolds389@redhat.com Date: Wed Nov 20 09:08:50 2013 -0500
1.3.2 d197eca..a87c3ee 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
1.3.1 bb658b8..03f6347 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
Comment from rmeggins (@richm) at 2013-11-23 08:21:18
89ae932..b19239f 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit b19239fdca5c00865471acfd5ffc8502c66b914a Author: Rich Megginson richm@redhat.com Date: Fri Nov 22 16:51:55 2013 -0700 d1d90fd..f4d5900 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit f4d5900579c773e5cf4b69eaeba6104078512ab0 Author: Rich Megginson richm@redhat.com Date: Fri Nov 22 16:51:55 2013 -0700 111b807..b330876 master -> master commit b330876a1bccd93a8e906ac56a10c002c981ecfc Author: Rich Megginson richm@redhat.com Date: Fri Nov 22 16:51:55 2013 -0700
Comment from mreynolds (@mreynolds389) at 2013-11-25 20:37:34
Fix memory leak 0001-Ticket-47599-fix-memory-leak.patch
Comment from rmeggins (@richm) at 2013-11-25 20:41:18
ack
Comment from mreynolds (@mreynolds389) at 2013-11-25 20:48:29
git merge ticket47599 Updating 51b1b83..a16bf1b Fast-forward ldap/servers/slapd/back-ldbm/seq.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
git push origin master Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 678 bytes, done. Total 7 (delta 5), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 51b1b83..a16bf1b master -> master
commit a16bf1b3c4ff0412c2481baace9b427750c11f8c Author: Mark Reynolds mreynolds389@redhat.com Date: Mon Nov 25 09:36:25 2013 -0500
git push origin 389-ds-base-1.3.2 Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 727 bytes, done. Total 7 (delta 5), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git cbac612..c7e7c68 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
Comment from nhosoi (@nhosoi) at 2013-12-11 01:25:08
0001-Ticket-47599-fix-memory-leak.patch: Pushed to 389-ds-base-1.3.1: 5c649dd..08dc37d 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 08dc37dc832e1ce78d27012a60b1691dba2f6501
Comment from mreynolds (@mreynolds389) at 2017-02-11 22:48:22
Metadata Update from @mreynolds389:
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47599
Looks like the locking in the retro changelog code is too big - write_replog_db - the retrocl_internal_lock lock scope is very large. Or perhaps even get rid of locking and use atomics.
We are seeing deadlock issues related to this as described in the following FreeIPA ticket: