389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 92 forks source link

Reduce lock scope in retro changelog plug-in #936

Closed 389-ds-bot closed 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47599


Looks like the locking in the retro changelog code is too big - write_replog_db - the retrocl_internal_lock lock scope is very large. Or perhaps even get rid of locking and use atomics.

We are seeing deadlock issues related to this as described in the following FreeIPA ticket:

https://fedorahosted.org/freeipa/ticket/3967
389-ds-bot commented 4 years ago

Comment from nkinder (@nkinder) at 2013-11-16 05:24:11

Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1031227

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2013-11-18 23:45:24

I thought PRUint64 operations were not atomic? I think it would be better to use the slapi_counter API - that way the slapi_counter API will take care of locking

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-19 00:00:22

Replying to [comment:6 richm]:

I thought PRUint64 operations were not atomic?

I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look). I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2013-11-19 00:22:28

Replying to [comment:7 mreynolds389]:

Replying to [comment:6 richm]:

I thought PRUint64 operations were not atomic?

I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look).

/*
* slapi_counter_add()
*
* Atomically add a value to a Slapi_Counter.
*/
PRUint64 slapi_counter_add(Slapi_Counter *counter, PRUint64 addvalue)
{
PRUint64 newvalue = 0;
...
#ifdef LINUX
newvalue = __sync_add_and_fetch(&(counter->value), addvalue);
...
return newvalue;

So no, PRUint64 newvalue++ is not atomic.

I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-19 00:52:31

Replying to [comment:8 richm]:

Replying to [comment:7 mreynolds389]:

Replying to [comment:6 richm]:

I thought PRUint64 operations were not atomic?

I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look).

/*
* slapi_counter_add()
*
* Atomically add a value to a Slapi_Counter.
*/
PRUint64 slapi_counter_add(Slapi_Counter *counter, PRUint64 addvalue)
{
PRUint64 newvalue = 0;
...
#ifdef LINUX
newvalue = __sync_add_and_fetch(&(counter->value), addvalue);
...
return newvalue;

So no, PRUint64 newvalue++ is not atomic.

Yeah, you're absolutely right. I really thought a PRUint64 was atomic, it comes up all the time when there's talk about lock-free access(hazard pointers, etc). I obviously misunderstood some of it. Anyway I'm reworking the patch.

I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-19 01:44:50

Replying to [comment:9 mreynolds389]:

Replying to [comment:8 richm]:

Replying to [comment:7 mreynolds389]:

Replying to [comment:6 richm]:

I thought PRUint64 operations were not atomic?

I was under the impression of the exact opposite, and this is exactly how slapi_counters work as well(please take a look).

/*
* slapi_counter_add()
*
* Atomically add a value to a Slapi_Counter.
*/
PRUint64 slapi_counter_add(Slapi_Counter *counter, PRUint64 addvalue)
{
PRUint64 newvalue = 0;
...
#ifdef LINUX
newvalue = __sync_add_and_fetch(&(counter->value), addvalue);
...
return newvalue;

So no, PRUint64 newvalue++ is not atomic.

Yeah, you're absolutely right. I really thought a PRUint64 was atomic, it comes up all the time when there's talk about lock-free access(hazard pointers, etc). I obviously misunderstood some of it. Anyway I'm reworking the patch.

I'd prefer to keep what I have, as it's much simpler than bringing in all the slapi_counter stuff. No? Thoughts?

New patch is attached.

Thanks, Mark

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2013-11-19 23:00:17

With slapi_counter, operations on single counters are atomic. However, operations on multiple counters are not.

void retrocl_commit_changenumber(void) 
{ 
    if ( slapi_counter_get_value(retrocl_first_cn) == 0) {
        slapi_counter_set_value(retrocl_first_cn, slapi_counter_get_value(retrocl_internal_cn));
    }
}

This is not atomic, because the value of retrocl_first_cn could change after testing it for 0.

Same with this:

    if(slapi_counter_get_value(retrocl_internal_cn) <= slapi_counter_get_value(retrocl_first_cn)){

The value of retrocl_internal_cn could change before doing slapi_counter_get_value(retrocl_first_cn) and the <= test.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-20 02:15:16

Replying to [comment:11 richm]:

With slapi_counter, operations on single counters are atomic. However, operations on multiple counters are not.

void retrocl_commit_changenumber(void) 
{ 
    if ( slapi_counter_get_value(retrocl_first_cn) == 0) {
        slapi_counter_set_value(retrocl_first_cn, slapi_counter_get_value(retrocl_internal_cn));
    }
}

This is not atomic, because the value of retrocl_first_cn could change after testing it for 0.

Same with this:

    if(slapi_counter_get_value(retrocl_internal_cn) <= slapi_counter_get_value(retrocl_first_cn)){

The value of retrocl_internal_cn could change before doing slapi_counter_get_value(retrocl_first_cn) and the <= test.

Hmm, I see. Looks like we need to use a RW lock instead of using the slapi counter approach. I'm not sure how else to do it.

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2013-11-20 06:30:38

This adds additional locking to retrocl_assign_changenumber(). The only place where retrocl_assign_changenumber is called is from write_replog_db(), inside of the PR_Lock(retrocl_internal_lock). I'm not sure if this will solve the ipa deadlock, and not introduce any additional deadlocks.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2013-11-20 15:31:33

Hi,

If this ticket is for reducing a critical section but not the hang, please skip the following.

Just a dummy remark regarding the hang (https://fedorahosted.org/freeipa/attachment/ticket/3967/ds.backtrace). It looks to me that the thread 28 (ssl handshake) is waiting for a lock aquired by thread 15 (applying a IPA set password in the retrocl). Thread 15 is waiting for a db lock (used by a cursor). I did not see any others threads in db. One possibility of the hang could be an aborted task (export/import/index..).

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-20 20:18:27

Replying to [comment:14 tbordaz]:

Hi,

If this ticket is for reducing a critical section but not the hang, please skip the following.

Just a dummy remark regarding the hang (https://fedorahosted.org/freeipa/attachment/ticket/3967/ds.backtrace). It looks to me that the thread 28 (ssl handshake) is waiting for a lock aquired by thread 15 (applying a IPA set password in the retrocl). Thread 15 is waiting for a db lock (used by a cursor). I did not see any others threads in db. One possibility of the hang could be an aborted task (export/import/index..).

This is a definitely a possibility, but we don't have any DS logs to see if any tasks were issued/aborted. So for now this patch is just to improve the locking. Once all of the retrocl patches are done, another round of testing will be done, and we'll go from there.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-20 20:18:53

RW locks - part 2 0001-Ticket-47599-Reduce-lock-scope-in-retro-changelog-pl.patch

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-20 20:26:47

Replying to [comment:13 richm]:

This adds additional locking to retrocl_assign_changenumber(). The only place where retrocl_assign_changenumber is called is from write_replog_db(), inside of the PR_Lock(retrocl_internal_lock). I'm not sure if this will solve the ipa deadlock, and not introduce any additional deadlocks.

Reworked the RW locking around retrocl_assign_changenumber() && retrocl_update_lastchangenumber(). Now I release the write lock before calling ldbm_back_seq(), and lock it after returning. It is safe to release the lock without fear of the CN changing, because we are now inside the the big internal lock, so everything(CN updates) are serialized at this stage.

So now the RW lock is only held while making an update to the changenumber, nothing else. There should not be any risk of deadlock now.

New patch attached.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-20 22:21:58

git merge retro Updating 2971ca1..e2c42bc Fast-forward ldap/servers/plugins/retrocl/retrocl.c | 3 +- ldap/servers/plugins/retrocl/retrocl.h | 1 + ldap/servers/plugins/retrocl/retrocl_cn.c | 42 ++++++++++++++++++++--------- ldap/servers/plugins/retrocl/retrocl_po.c | 2 +- 4 files changed, 33 insertions(+), 15 deletions(-)

git push origin master Counting objects: 19, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (10/10), 1.37 KiB, done. Total 10 (delta 8), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 2971ca1..e2c42bc master -> master

commit e2c42bced86bac235ac56ae98eed303f61ebd15e Author: Mark Reynolds mreynolds389@redhat.com Date: Wed Nov 20 09:08:50 2013 -0500

1.3.2 d197eca..a87c3ee 389-ds-base-1.3.2 -> 389-ds-base-1.3.2

1.3.1 bb658b8..03f6347 389-ds-base-1.3.1 -> 389-ds-base-1.3.1

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2013-11-23 08:21:18

89ae932..b19239f 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit b19239fdca5c00865471acfd5ffc8502c66b914a Author: Rich Megginson richm@redhat.com Date: Fri Nov 22 16:51:55 2013 -0700 d1d90fd..f4d5900 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit f4d5900579c773e5cf4b69eaeba6104078512ab0 Author: Rich Megginson richm@redhat.com Date: Fri Nov 22 16:51:55 2013 -0700 111b807..b330876 master -> master commit b330876a1bccd93a8e906ac56a10c002c981ecfc Author: Rich Megginson richm@redhat.com Date: Fri Nov 22 16:51:55 2013 -0700

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-25 20:37:34

Fix memory leak 0001-Ticket-47599-fix-memory-leak.patch

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2013-11-25 20:41:18

ack

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2013-11-25 20:48:29

git merge ticket47599 Updating 51b1b83..a16bf1b Fast-forward ldap/servers/slapd/back-ldbm/seq.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)

git push origin master Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 678 bytes, done. Total 7 (delta 5), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 51b1b83..a16bf1b master -> master

commit a16bf1b3c4ff0412c2481baace9b427750c11f8c Author: Mark Reynolds mreynolds389@redhat.com Date: Mon Nov 25 09:36:25 2013 -0500

git push origin 389-ds-base-1.3.2 Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 727 bytes, done. Total 7 (delta 5), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git cbac612..c7e7c68 389-ds-base-1.3.2 -> 389-ds-base-1.3.2

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2013-12-11 01:25:08

0001-Ticket-47599-fix-memory-leak.patch: Pushed to 389-ds-base-1.3.1: 5c649dd..08dc37d 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 08dc37dc832e1ce78d27012a60b1691dba2f6501

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-02-11 22:48:22

Metadata Update from @mreynolds389: