Closed 389-ds-bot closed 4 years ago
Comment from lkrispen (@elkris) at 2018-06-14 15:15:05
checking the two available core files ŕevealed that the last operations included getEffectivRights (GER) searches and in GER we do replace the aclcb in the connection extension
In fact we do:
_ger_new_gerpb()
geraclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
*aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)geraclcb);
which creates an connection extension and sets it to the connection and later we do:
_ger_release_gerpb()
geraclcb = (struct aclcb *)acl_get_ext(ACL_EXT_CONNECTION, conn);
acl_conn_ext_destructor(geraclcb, NULL, NULL);
acl_set_ext(ACL_EXT_CONNECTION, conn, *aclcb);
which destroys the aclcb in the connection and then sets the saved aclcb back Now, this is definitely the wrong order, but changing the order of restoring the original and destroying the temporary will not be sufficient, an other thread could still hold a reference to the aclcb which will be destroyed.
I see two options to fix this: 1] Investigate why GER needs its own specific aclcb in the connection and if this can be avoided, optimal solution 2] do not blindly get and set connection extensions, but always use the lock in aclcb during access to it. Then we could lock it, copy its context to a backup aclpb and when done, under the lock copy it back
Comment from lkrispen (@elkris) at 2018-06-14 15:15:14
Metadata Update from @elkris:
Comment from lkrispen (@elkris) at 2018-06-14 15:25:06
Metadata Update from @elkris:
Comment from mreynolds (@mreynolds389) at 2018-06-14 17:37:40
Metadata Update from @mreynolds389:
Comment from tbordaz (@tbordaz) at 2018-06-19 15:56:22
To be honest this part of code is so complex that I am not sure I understand it correctly :(
First evaluation context looks to be a key element of right evaluation. Considering that the GER can evaluate the rights of a proxy user to an entry/attri, it looks normal we need an evaluation context different from the one used by the bound user. I do not understand the need of changing the aclcb but rather could it be done via acl_clean_aclEval_context (in _ger_new_gerpb and _ger_release_gerpb).
Now with asynchronous operations, the evaluation context could be cleaned for a GER operation, if an other operation is evaluation rights in parallel, I not sure which prevent the two evaluations to be merged.
Comment from lkrispen (@elkris) at 2018-06-20 09:29:49
I checked the use of ACL_EXT_CONNECTION. As far as I can see it is only used in acl_init_aclpb() and only if the parameter "copy_from_aclcb" is set.
In _ger_new_gerpb() we call
acl_init_aclpb(*gerpb, geraclpb, subjectndn, 0);
so it will get the conn exstension, but not use it. I don't see the need of the special gercb
Comment from tbordaz (@tbordaz) at 2018-06-21 16:30:35
ACL_EXT_CONNECTION is also used during operation desctructor. It looks to me that each time an operation completes, part of the evaluation context in the connection is cleared/reset.
In _ger_new_gerpb, it sets a new gercb (in the connection) but with the bind sdn ('subjectndn') specified in the GER control. So I believe it needs this specific gercb to evaluate rights of the specified DN not of the connection bound DN.
Comment from lkrispen (@elkris) at 2018-06-21 16:40:12
ACL_EXT_CONNECTION is also used during operation desctructor. It looks to me that each time an operation completes, part of the evaluation context in the connection is cleared/reset.
yes. it copies the context from the operation to the connection, but this would not be affected if we do not set the aclcb, since it should already be reset when the destructor is called
In _ger_new_gerpb, it sets a new gercb (in the connection) but with the bind sdn ('subjectndn') specified in the GER control. So I believe it needs this specific gercb to evaluate rights of the specified DN not of the connection bound DN.
but it is set, to the aclpb in acl_init_aclpb, the aclcb is not used as far as I see
Comment from lkrispen (@elkris) at 2018-06-28 14:15:59
I finally decided not to take the risk of just not using the eval context in GER, I still believe it is not used, but there are several places where it is touched, copied, ....
So the proposed solution is:
Unfortunately testing this I did run into another ACL issue. The connection extension has a lock, this lock is set when a new connection extension is created (and in GER it will be created frequently). But these locks are taken from a pre alloacted array of locks and this array is used in a circular mode in aclext_get_lock() and so it can happen that two extensions do use the same lock. The size of the lock array is 40.
This is really bad, and just to avoid the call to PR_NewLock() and PR_DestroyLock() for each connection extension. I would like to remove the use of the array and always create a lock wneh allocating the connection exdtension
Comment from mreynolds (@mreynolds389) at 2018-06-28 21:54:45
Well I tried to review this looking for the mishandling of a lock (as mention in a previous meeting). I don't fully understand all of the GER code, but I wondering if we need to reset the extension after the swap?
@@ -318,14 +354,17 @@ _ger_new_gerpb(
slapi_pblock_set(*gerpb, SLAPI_CONNECTION, conn);
/* Can't share the conn->aclcb because of different context */
- geraclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
- if (geraclcb == NULL) {
+ ger_aclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
+ if (ger_aclcb == NULL) {
rc = LDAP_NO_MEMORY;
goto bailout;
}
- slapi_sdn_set_ndn_byval(geraclcb->aclcb_sdn, subjectndn);
- *aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
- acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)geraclcb);
+ slapi_sdn_set_ndn_byval(ger_aclcb->aclcb_sdn, subjectndn);
+ conn_aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
+ PR_Lock(conn_aclcb->aclcb_lock);
+ _ger_swap_aclcb(conn_aclcb, ger_aclcb);
+ *save_aclcb = ger_aclcb;
+ PR_Unlock(conn_aclcb->aclcb_lock);
}
After the unlock should we do?
acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)ger_aclcb);
In the old code it was always reset, but maybe that's not needed with your new design.
Comment from lkrispen (@elkris) at 2018-07-04 09:59:42
In the old code it was always reset, but maybe that's not needed with your new design.
well, that is what the new code also is doing, at least it is my intent. Just in a different way.
In the old code we had a start of GER
and at the finis of GER
But this had no control if another threadhad a reference to C_GER and could access freed data.
My approach is to not free anything that could be seen outside the actual GER:
at begin of GER
at end of ger
This approach tries to keep the old behaviour and to avoid potential crashes. It doe not address the problem of what concurrent operations should see and set.
Comment from tbordaz (@tbordaz) at 2018-07-04 10:22:08
That is an excellent fix. This ticket being to prevent the crash, the fix will address the ticket. The problem of concurrent operations sharing inappropriate value from the cblock existed before and with your fix. This problem would need a different ticket.
Comment from lkrispen (@elkris) at 2018-07-04 11:35:13
I have created ticket 49826 for the general investigation
Comment from mreynolds (@mreynolds389) at 2018-07-05 16:51:00
ack from me for current fix
Comment from lkrispen (@elkris) at 2018-07-10 15:06:14
To ssh://pagure.io/389-ds-base.git 8fa838a4f..5537467ef master -> master
Comment from lkrispen (@elkris) at 2018-07-10 15:14:23
To ssh://pagure.io/389-ds-base.git 8720370b3..df357f558 389-ds-base-1.3.8 -> 389-ds-base-1.3.8
To ssh://pagure.io/389-ds-base.git 855d78b5a..c907bcaa6 389-ds-base-1.3.7 -> 389-ds-base-1.3.7
Comment from mreynolds (@mreynolds389) at 2018-07-10 15:45:39
Can you also cherry-pick this to 1.2.11 (for 6.10)? Thanks!
Comment from lkrispen (@elkris) at 2018-07-12 15:14:39
backported to 1.3.6
To ssh://pagure.io/389-ds-base.git b1c370752..b45afc2f3 389-ds-base-1.3.6 -> 389-ds-base-1.3.6
Comment from lkrispen (@elkris) at 2018-07-12 15:16:44
Comment from lkrispen (@elkris) at 2018-07-24 09:29:37
To ssh://pagure.io/389-ds-base.git f3f5d2833..85199121e 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
Comment from mreynolds (@mreynolds389) at 2019-08-23 19:50:32
Metadata Update from @mreynolds389:
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49780
Issue Description
slapd crashes with the following or similar stack trace:
Package Version and Platform
crash was reported for 389-ds-base-1.2.11.15-91.el6_9.x86_64
but code is unchanged, so new versions could be affected as well
Steps to reproduce
No reproducer available
Actual results
crash
Expected results
no crash