Create a global lock to serialize write operations over several backends

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47936

Created at 2014-10-23 14:13:16 by tbordaz (@tbordaz)
Closed as Fixed
Assigned to tbordaz (@tbordaz)

The origin of this ticket is https://fedorahosted.org/freeipa/ticket/4635

This IPA ticket is a deadlock where two threads acquire locks in different orders. A reason of the deadlock is that a plugin access pages on different backends for which it does not hold the backend lock.

This ticket is to evaluate if a global lock would prevent such deadlock.

This global lock need to cover 'ldbm database' backend but also other backend like 'cn=config' or 'cn=schema'...

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2014-10-23 14:37:53

This occurs in the following conditions:

If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure
The betxn_post plugin is doing internal op (read or write) while holding its own lock
The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2. A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired. This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock release d.

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2014-10-23 22:35:49

Replying to [comment:1 tbordaz]:

This occurs in the following conditions:

If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure

The betxn_post plugin is doing internal op (read or write) while holding its own lock

The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2. A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired. This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock released.

Thierry,if the deadlock is caused by the backend serial lock and a global lock in the individual plugin, I also ran into the issue multiple times, and fixed by starting transaction outside of holding the plugin global lock. The backend transaction is re-entrant, so you could call slapi_back_transaction_begin multiple times in one thread.

Could you search slapi_back_transaction in the plugin directory and see if it works as a workaround or not? Here's an example. automember/automember.c: if(slapi_back_transaction_begin(fixup_pb) != LDAP_SUCCESS){ automember/automember.c: slapi_back_transaction_abort(fixup_pb); automember/automember.c: slapi_back_transaction_commit(fixup_pb);

Thanks! --noriko

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2014-10-24 01:15:54

Replying to [comment:1 tbordaz]:

This occurs in the following conditions:

If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure

The betxn_post plugin is doing internal op (read or write) while holding its own lock

The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

What does "backends for which it does not hold the backend lock" mean?

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2. A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired. This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock release d.

Ok. So this is a new issue caused by moving all of the plugins to be pre/post betxn plugins. The locking order should be: 1) backend lock - that is, a new transaction 2) plugin lock

In this case, plugins such as schema reload, fixup memberof, etc. should first do like Noriko suggested, and do a slapi_back_transaction_begin e.g.

plugin()
{
   ...
   slapi_back_transaction_begin
   ...
   plugin lock
   ...
   do internal operations
   ...
   plugin unlock
   ...
   slapi_back_transaction_abort/commit
}

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2014-10-24 16:22:59

Replying to [comment:4 richm]:

Replying to [comment:1 tbordaz]:

This occurs in the following conditions:

If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure

The betxn_post plugin is doing internal op (read or write) while holding its own lock

The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

What does "backends for which it does not hold the backend lock" mean?

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2. A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired. This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock release d.

Ok. So this is a new issue caused by moving all of the plugins to be pre/post betxn plugins. The locking order should be: 1) backend lock - that is, a new transaction 2) plugin lock

In this case, plugins such as schema reload, fixup memberof, etc. should first do like Noriko suggested, and do a slapi_back_transaction_begin e.g.
plugin()
{
   ...
   slapi_back_transaction_begin
   ...
   plugin lock
   ...
   do internal operations
   ...
   plugin unlock
   ...
   slapi_back_transaction_abort/commit
}

Noriko, Rich,

Thanks you so much for having look into that ticket. In fact you are right, schema compat plugin is not using slapi_back_transaction_begin/slapi_back_transaction_abort/commit to surround its plugin lock. I will implement your suggestion.

A question about how slapi_back_transaction would prevent the deadlock, does it rely on the deadlock detection done by the DB.

thanks thierry

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2014-10-24 20:39:25

Starting a transaction acquires a lock over the entire db. No other write operations can proceed. A search operation may proceed, and a deadlock is still possible, if the write operation attempts to acquire a write lock on a page that a search thread has a read lock on, and the search thread cannot proceed because the writer thread has some other lock (e.g. a plugin lock) that the search thread is waiting on. However, if this happens frequently, changing the db deadlock policy Ticket 47409 may help.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2014-10-26 22:41:09

slapi_back_transaction_begin/slapi_back_transaction_commit is not able to prevent the deadlock . This is the same deadlock stack than in https://fedorahosted.org/freeipa/ticket/4635#comment:1
- The automember task modifies a static group to add a member. It holds in write some member.db page (under transaction). Starts a transaction (slapi_back_transaction_begin) and tries to acquire the schema compat map lock
```
member: fqdn=web1.idm.lab.bos.redhat.com,cn=computers,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
group : cn=hostgroup1,cn=hostgroups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
```
- A task entry is added (in cn=config) that acquires schema compat map lock (postop add). Before acquiring the map lock it started a transaction (slapi_back_transaction_begin) but on cn=config (not on bdb backend). Then it tries to acquire (read) a member.db page to do the following search
```
Added entry: cn=cf58f2cd-8015-48e8-88f9-014a58fa830c,cn=automember rebuild membership,cn=tasks,cn=config
        Internal search 
                base: "cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" 
                backend "userRoot"
                scope: one level
                filter: member=cn=cf58f2cd-8015-48e8-88f9-014a58fa830c,cn=automember rebuild membership,cn=tasks,cn=config
```
slapi_back_transaction_begin/slapi_back_transaction_commit can only prevent deadlock if all callers start transactions on bdb backends. As ADDing the task entry is done on a none bdb backend, slapi_backtransaction* can not prevent deadlock
The internal search looks useless, as it makes no sense to look for a task entry into an accounts group. A possible fix in schema compat, would be to prevent those searches. Now it could be difficult to fitler it and it is not sure we can prevent ALL internal searches (triggered by schema compat) under the database.
The remaining solution is to use a global backend lock (see https://fedorahosted.org/freeipa/ticket/4635#comment:13). Need to do some tests on the performance side. I will prepare a review on this 389-ds patch

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2014-12-06 05:13:59

Check if this is still needed with Thierry. 11/18 - It is still needed, but it's not urgent. 1.3.4

Per ticket triage, setting the milestone to 1.3.4.

389-ds-bot commented 4 years ago

Comment from mkosek at 2015-02-24 19:11:34

I would like to utilize this lock at least for https://fedorahosted.org/freeipa/ticket/4925.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-02-27 01:51:35

attachment 0001-Ticket-47936-Create-a-global-lock-to-serialize-write.patch

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2015-02-27 02:34:22

does the "cn=config" backend include "cn=schema"?

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-02-27 03:50:16

Replying to [comment:12 richm]:

does the "cn=config" backend include "cn=schema"?

Thanks Rich for looking at it. I need to verify this tomorrow but I think it should. My understanding is that:

'backend-type: DSE'               ->  global lock applies to both "cn=config" and "cn=schema".
'backend-name: frontent-internal' ->  global lock applies to cn=config
'backend-name: schema-internal'   ->  global lock applies to cn=schema

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-02-27 03:59:45

A minor issue :); it'd better put the new code (3970 - 3972) after the PR_ASSERT(NULL != be);

…   …   void dblayer_lock_backend(backend *be) 
3967    3967    { 
3968    3968        ldbm_instance *inst; 
3969    3969     
    3970        if (global_backend_lock_requested(be)) { 
    3971            global_backend_lock_lock(); 
    3972        } 
    3973         
3970    3974        PR_ASSERT(NULL != be);

No need to check the return value from create_global_lock_entry? If it fails, global lock is not acquired? Any other disadvantage?

    291 global_backend_lock_init() 
    292 { 
    293     /* Create the global backend entry in dse.ldif (if it does not already exist) */ 
    294     create_global_lock_entry();

The logging is available only if SLAPI_LOG_CONFIG is set == not too verbose, you don't need "#if 1"?

    246 #if 1 
    247                      slapi_log_error(SLAPI_LOG_CONFIG, NULL, "Global backend lock applies on %s: %s\n",  
    248                              msg ? msg : "", value); 
    249 #endif

May not be an issue, but it'd be nice if ... (in dse_modify, dse_add, and dse_delete)

a   b   dse_modify
    1831        Slapi_Backend *be; 
                    ==> Slapi_Backend *be = NULL;
    1842        slapi_pblock_get( pb, SLAPI_BACKEND, &be); 
    1878        /* Possibly acquire the global backend lock */ 
    1879        if (global_backend_lock_requested(be)) {
                    ==> if (be && global_backend_lock_requested(be)) {
    1880            global_backend_lock_lock(); 
    1881            global_lock_owned = PR_TRUE; 
    1882        }

Indentation? :)

…   …   main( int argc, char **argv) 
1057    1057            /* initialize the normalized DN cache */ 
1058    1058            ndn_cache_init(); 
1059    1059     
    1060                global_backend_lock_init(); 
1060    1061            /*

I'm a bit afraid that every time you have to do strcasecmp in 2 for loops in global_backend_lock_requested to acquire a lock... It'd be nice if we could do integer compare instead... (by assigning an ID or something...) But it may introduce more complexity? 303 global_backend_lock_requested(Slapi_Backend *be)

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-02-27 15:49:16

Replying to [comment:12 richm]:

does the "cn=config" backend include "cn=schema"?

Rich,

Yes both backends 'cn=config' and 'cn=schema' are covered by the global lock.

'backend-type: DSE'               ->  global lock applies to both "cn=config" and "cn=schema".
'backend-name: frontent-internal' ->  global lock applies to cn=config
'backend-name: schema-internal'   ->  global lock applies to cn=schema

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-02-27 22:50:22

Performance measurement (throughput): When the global lock is not enabled, performance are identical compare to 1.3.1 (reference). When the global lock is enabled, it has no performance impact on read. When the global lock is enabled, on write the impact is the most significant when writes are done in parallel on backends protected by the same global lock. In that case it reduce the throughput by ~1/3rd. When write is done on only on backend (but protected by a global lock), it has no impact on ADD/DEL, on MOD it reduces throughput ~15%.

Tool: ldclt (Global average rate)

Per suffix
Number of entries : 50000
Number of op thread: 10
Entry cache: 100Mb

Dbcache: 100Mb

        DS version      1.3.1.6                                 master + 47936        
                                        global_lock disable                     global_lock enabled

ADD
        one suffix      714.30/s                714.30/s                                714.30/s
        two suffixes    625.01/s                714.30/s                                500.01/s

DEL
        one suffix      714.30/s                714.30/s                                714.30/s
        two suffixes    555.57/s                625.01/s                                384.62/s

MOD
        one suffix      1000.02/s               1000.02/s                               833.35/s
        two suffixes    1000.02/s               1000.02/s                               714.30/s

SRCH
        one suffix      2500.05/s               2500.05/s                               2500.05/s
        two suffixes    2500.05/s               2500.05/s                               2500.05/s

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-02-27 22:56:19

attachment perf_data.tar.gz

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2015-03-02 14:22:45

copying my comment from the mailing list to the ticket, and adding another concern

does it make sense to allow subsets of the backends to be locked by the global lock. In our experience with deadlaocks different combinations of backends were involved - so to enable it one would have to know in advance which backends to list. I think if we introduce this global lock, make it global, it would also make the config check if it should be used faster. If you want to keep the configuration of individual backends I would like to have one "ALL" option

if the global lock is not really global it can introduce a new deadlock which was not there before: assume to have four backends A,B,C,D

scenario without gloabl lock: 1] thread 1 mods backend A, tackes backend lock for A 2] simultaneously thread 2 mods backend C, takes backend lock for C 3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C 4] postop plugin in thread 2 wants to modify entry in D, 4.1] takes backend lock D 4.2] modifies entry in D 4.3] releases backen lock D 5] thread 2 release backend lock C 6] thread 1 gets backend lock C and continues

now configure global lock, but only for A,B,D, same scenario now gets: 1] thread 1 mods backend A, tackes backend lock for A, takes global lock 2] simultaneously thread 2 mods backend C, takes backend lock for C, no global lock 3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C 4] postop plugin in thread 2 wants to modify entry in D, 4.1] for D global lock is required, wait for global lock

no thread can progress,

thread 1 holds global lock, waits for backend lock C
thread 2 holds backen lock C, waits for global lock

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-03-02 16:11:12

attachment 0002-Ticket-47936-Create-a-global-lock-to-serialize-write.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-03-02 16:32:16

Replying to [comment:17 elkris]:

copying my comment from the mailing list to the ticket, and adding another concern
does it make sense to allow subsets of the backends to be locked by the global lock. In our experience with deadlaocks different combinations of backends were involved - so to enable it one would have to know in advance which backends to list. I think if we introduce this global lock, make it global, it would also make the config check if it should be used faster. If you want to keep the configuration of individual backends I would like to have one "ALL" option
if the global lock is not really global it can introduce a new deadlock which was not there before: assume to have four backends A,B,C,D

scenario without gloabl lock: 1] thread 1 mods backend A, tackes backend lock for A 2] simultaneously thread 2 mods backend C, takes backend lock for C 3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C 4] postop plugin in thread 2 wants to modify entry in D, 4.1] takes backend lock D 4.2] modifies entry in D 4.3] releases backen lock D 5] thread 2 release backend lock C 6] thread 1 gets backend lock C and continues

now configure global lock, but only for A,B,D, same scenario now gets: 1] thread 1 mods backend A, tackes backend lock for A, takes global lock 2] simultaneously thread 2 mods backend C, takes backend lock for C, no global lock 3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C 4] postop plugin in thread 2 wants to modify entry in D, 4.1] for D global lock is required, wait for global lock

no thread can progress,

thread 1 holds global lock, waits for backend lock C

thread 2 holds backen lock C, waits for global lock

Hello Ludwig,

Thanks for your remarks. I think your second remark (case of deadlock) makes more important the need to be able to protect in a simple way all backends. I changed the fix to support a 'all' keyword (backend-type or backend-name). I am testing this new fix.

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2015-03-02 17:54:14

Replying to [comment:18 tbordaz]:

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

The issue occurs only if you allow the global lock to be configured oj a subset of the backends. I still do not see the need to have these configuration options. In my opinion "ALL" or "NOTHING" should be the only option for the global lock.

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2015-03-02 20:43:20

Replying to [comment:19 elkris]:

Replying to [comment:18 tbordaz]:

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

The issue occurs only if you allow the global lock to be configured oj a subset of the backends. I still do not see the need to have these configuration options. In my opinion "ALL" or "NOTHING" should be the only option for the global lock.

+1 Then we can see if we need to have more fine-grained locking.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-03-02 23:19:53

attachment 0003-Ticket-47936-Create-a-global-lock-to-serialize-write.patch

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-03-02 23:25:29

Replying to [comment:20 richm]:

Replying to [comment:19 elkris]:

Replying to [comment:18 tbordaz]:

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

The issue occurs only if you allow the global lock to be configured oj a subset of the backends. I still do not see the need to have these configuration options. In my opinion "ALL" or "NOTHING" should be the only option for the global lock.

+1 Then we can see if we need to have more fine-grained locking.

+1

And I still don't see the point checking on each backend whether it's in the configured list dynamically in global_backend_lock_requested. The configured list is determined in the init phase and there's no case that a backend is in the list at a moment and is removed from it at another moment (I guess?). And if it's possible, it's even worse. If a backend acquires the global lock and could get removed from the global lock list, the server hangs there... I know that does not occur in the current implementation as I mentioned. That's said the dynamic checking is not needed? I also prefer the simple approach to solve this issue...

389-ds-bot commented 4 years ago

Comment from rmeggins (@richm) at 2015-03-03 00:12:26

PRMonitor *global_backend_mutex = NULL;

This should be static

otherwise, ack

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-03-03 15:54:25

attachment 0004-Ticket-47936-Create-a-global-lock-to-serialize-write.patch

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2015-03-03 15:55:18

'git merge ticket47936' Updating 5c6329e..8583012 Fast-forward ldap/servers/slapd/back-ldbm/dblayer.c | 7 +++++++ ldap/servers/slapd/backend.c | 29 +++++++++++++++++++++++++++++ ldap/servers/slapd/dse.c | 32 ++++++++++++++++++++++++++++++-- ldap/servers/slapd/libglobs.c | 33 ++++++++++++++++++++++++++++++++- ldap/servers/slapd/main.c | 1 + ldap/servers/slapd/proto-slap.h | 6 ++++++ ldap/servers/slapd/slap.h | 2 ++ 7 files changed, 107 insertions(+), 3 deletions(-)

'git push origin master' Counting objects: 25, done. Delta compression using up to 4 threads. Compressing objects: 100% (13/13), done. Writing objects: 100% (13/13), 2.37 KiB, done. Total 13 (delta 11), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 5c6329e..8583012 master -> master

'git push origin 389-ds-base-1.3.3' Counting objects: 25, done. Delta compression using up to 4 threads. Compressing objects: 100% (13/13), done. Writing objects: 100% (13/13), 2.38 KiB, done. Total 13 (delta 11), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 8f6871b..748054f 389-ds-base-1.3.3 -> 389-ds-base-1.3.3

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2017-02-11 22:48:11

Metadata Update from @tbordaz:

Issue assigned to tbordaz
Issue set to the milestone: 1.3.3.9

389ds / 389-ds-base

Create a global lock to serialize write operations over several backends #1267