389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 90 forks source link

acl_copyEval_context double free #2839

Closed 389-ds-bot closed 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49780


Issue Description

slapd crashes with the following or similar stack trace:

 0  acl_copyEval_context (aclpb=0x0, src=0x4fc5730, dest=0x7f2d00452bf0, copy_attr_only=0) at ldap/servers/plugins/acl/acl.c:3661
 1  0x00007f2d3533df71 in acl_operation_ext_destructor (ext=0x4fc5650, object=<value optimized out>, parent=<value optimized out>)
at ldap/servers/plugins/acl/acl_ext.c:328
 2  0x000000368a0637da in factory_destroy_extension (type=<value optimized out>, object=0x30fc7f0, parent=0x7f2d24b825d0, extension=0x30fc8a8)
at ldap/servers/slapd/factory.c:405
 3  0x000000368a08c16e in operation_free (op=0x3009480, conn=0x7f2d24b825d0) at ldap/servers/slapd/operation.c:220
 4  0x000000368a095ef8 in pblock_done (pb=0x3009470) at ldap/servers/slapd/pblock.c:114
 5  0x000000368a095f33 in slapi_pblock_destroy (pb=0x3009470) at ldap/servers/slapd/pblock.c:125
 6  0x00000000004140b3 in connection_threadmain () at ldap/servers/slapd/connection.c:2382
 7  0x0000003693c29c53 in _pt_root (arg=0x32c7c90) at ../../../nspr/pr/src/pthreads/ptthread.c:216
 8  0x0000003688007aa1 in start_thread (arg=0x7f2d0cbed700) at pthread_create.c:301
 9  0x0000003687ce8bcd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:122
 10 0x0000000000000000 in ?? ()

Package Version and Platform

crash was reported for 389-ds-base-1.2.11.15-91.el6_9.x86_64

but code is unchanged, so new versions could be affected as well

Steps to reproduce

No reproducer available

Actual results

crash

Expected results

no crash

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-06-14 15:15:05

checking the two available core files ŕevealed that the last operations included getEffectivRights (GER) searches and in GER we do replace the aclcb in the connection extension

In fact we do:

 _ger_new_gerpb()
        geraclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
       *aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
        acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)geraclcb);

which creates an connection extension and sets it to the connection and later we do:

 _ger_release_gerpb()
       geraclcb = (struct aclcb *)acl_get_ext(ACL_EXT_CONNECTION, conn);
        acl_conn_ext_destructor(geraclcb, NULL, NULL);
        acl_set_ext(ACL_EXT_CONNECTION, conn, *aclcb);

which destroys the aclcb in the connection and then sets the saved aclcb back Now, this is definitely the wrong order, but changing the order of restoring the original and destroying the temporary will not be sufficient, an other thread could still hold a reference to the aclcb which will be destroyed.

I see two options to fix this: 1] Investigate why GER needs its own specific aclcb in the connection and if this can be avoided, optimal solution 2] do not blindly get and set connection extensions, but always use the lock in aclcb during access to it. Then we could lock it, copy its context to a backup aclpb and when done, under the lock copy it back

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-06-14 15:15:14

Metadata Update from @elkris:

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-06-14 15:25:06

Metadata Update from @elkris:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-14 17:37:40

Metadata Update from @mreynolds389:

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2018-06-19 15:56:22

To be honest this part of code is so complex that I am not sure I understand it correctly :(

First evaluation context looks to be a key element of right evaluation. Considering that the GER can evaluate the rights of a proxy user to an entry/attri, it looks normal we need an evaluation context different from the one used by the bound user. I do not understand the need of changing the aclcb but rather could it be done via acl_clean_aclEval_context (in _ger_new_gerpb and _ger_release_gerpb).

Now with asynchronous operations, the evaluation context could be cleaned for a GER operation, if an other operation is evaluation rights in parallel, I not sure which prevent the two evaluations to be merged.

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-06-20 09:29:49

I checked the use of ACL_EXT_CONNECTION. As far as I can see it is only used in acl_init_aclpb() and only if the parameter "copy_from_aclcb" is set.

In _ger_new_gerpb() we call

 acl_init_aclpb(*gerpb, geraclpb, subjectndn, 0);

so it will get the conn exstension, but not use it. I don't see the need of the special gercb

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2018-06-21 16:30:35

ACL_EXT_CONNECTION is also used during operation desctructor. It looks to me that each time an operation completes, part of the evaluation context in the connection is cleared/reset.

In _ger_new_gerpb, it sets a new gercb (in the connection) but with the bind sdn ('subjectndn') specified in the GER control. So I believe it needs this specific gercb to evaluate rights of the specified DN not of the connection bound DN.

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-06-21 16:40:12

ACL_EXT_CONNECTION is also used during operation desctructor. It looks to me that each time an operation completes, part of the evaluation context in the connection is cleared/reset.

yes. it copies the context from the operation to the connection, but this would not be affected if we do not set the aclcb, since it should already be reset when the destructor is called

In _ger_new_gerpb, it sets a new gercb (in the connection) but with the bind sdn ('subjectndn') specified in the GER control. So I believe it needs this specific gercb to evaluate rights of the specified DN not of the connection bound DN.

but it is set, to the aclpb in acl_init_aclpb, the aclcb is not used as far as I see

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-06-28 14:15:59

I finally decided not to take the risk of just not using the eval context in GER, I still believe it is not used, but there are several places where it is touched, copied, ....

So the proposed solution is:

Unfortunately testing this I did run into another ACL issue. The connection extension has a lock, this lock is set when a new connection extension is created (and in GER it will be created frequently). But these locks are taken from a pre alloacted array of locks and this array is used in a circular mode in aclext_get_lock() and so it can happen that two extensions do use the same lock. The size of the lock array is 40.

This is really bad, and just to avoid the call to PR_NewLock() and PR_DestroyLock() for each connection extension. I would like to remove the use of the array and always create a lock wneh allocating the connection exdtension

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-06-28 16:41:13

here is my suggested patch

0001-Ticket-49780-acl_copyEval_context-double-free.patch

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-06-28 21:54:45

Well I tried to review this looking for the mishandling of a lock (as mention in a previous meeting). I don't fully understand all of the GER code, but I wondering if we need to reset the extension after the swap?

@@ -318,14 +354,17 @@ _ger_new_gerpb(
         slapi_pblock_set(*gerpb, SLAPI_CONNECTION, conn);

         /* Can't share the conn->aclcb because of different context */
-        geraclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
-        if (geraclcb == NULL) {
+        ger_aclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
+        if (ger_aclcb == NULL) {
             rc = LDAP_NO_MEMORY;
             goto bailout;
         }
-        slapi_sdn_set_ndn_byval(geraclcb->aclcb_sdn, subjectndn);
-        *aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
-        acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)geraclcb);
+        slapi_sdn_set_ndn_byval(ger_aclcb->aclcb_sdn, subjectndn);
+        conn_aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
+        PR_Lock(conn_aclcb->aclcb_lock);
+        _ger_swap_aclcb(conn_aclcb, ger_aclcb);
+        *save_aclcb = ger_aclcb;
+        PR_Unlock(conn_aclcb->aclcb_lock);
     }

After the unlock should we do?

acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)ger_aclcb);

In the old code it was always reset, but maybe that's not needed with your new design.

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-07-04 09:59:42

In the old code it was always reset, but maybe that's not needed with your new design.

well, that is what the new code also is doing, at least it is my intent. Just in a different way.

In the old code we had a start of GER

and at the finis of GER

But this had no control if another threadhad a reference to C_GER and could access freed data.

My approach is to not free anything that could be seen outside the actual GER:

at begin of GER

at end of ger

This approach tries to keep the old behaviour and to avoid potential crashes. It doe not address the problem of what concurrent operations should see and set.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2018-07-04 10:22:08

That is an excellent fix. This ticket being to prevent the crash, the fix will address the ticket. The problem of concurrent operations sharing inappropriate value from the cblock existed before and with your fix. This problem would need a different ticket.

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-07-04 11:35:13

I have created ticket 49826 for the general investigation

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-07-05 16:51:00

ack from me for current fix

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-07-10 15:06:14

To ssh://pagure.io/389-ds-base.git 8fa838a4f..5537467ef master -> master

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-07-10 15:14:23

To ssh://pagure.io/389-ds-base.git 8720370b3..df357f558 389-ds-base-1.3.8 -> 389-ds-base-1.3.8

To ssh://pagure.io/389-ds-base.git 855d78b5a..c907bcaa6 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2018-07-10 15:45:39

Can you also cherry-pick this to 1.2.11 (for 6.10)? Thanks!

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-07-12 15:14:39

backported to 1.3.6

To ssh://pagure.io/389-ds-base.git b1c370752..b45afc2f3 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-07-12 15:16:44

0001-Ticket-49780-backport-1.2.11-acl_copyEval_context-do.patch

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2018-07-24 09:29:37

To ssh://pagure.io/389-ds-base.git f3f5d2833..85199121e 389-ds-base-1.2.11 -> 389-ds-base-1.2.11

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-08-23 19:50:32

Metadata Update from @mreynolds389: