389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 91 forks source link

IDL Range Format #2475

Open 389-ds-bot opened 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49416


Issue Description

Based on discussions and the POC https://github.com/Firstyear/idl_rust/ there are many cases where a change to the IDL format may yield faster searches due to compression and reduced instruction comphrehension. We should support different index formats listed by some flag in the start of the IDL, and we should support this format as one option,

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-10-26 18:00:34

Metadata Update from @mreynolds389:

droideck commented 2 years ago

@Firstyear what do you think, is it still relevant?

Firstyear commented 2 years ago

I still think it's relevant, but I think query optimisation would be more important ... though this would certainly help. We use the idl range code in Kanidm, and with the query optimiser it works really well. The different for a search etime is:

389-ds: wtime=0.000412792 optime=0.000393020 etime=0.000802825 kanidm: etime=0.000282999

And that's simply on

LDAPTLS_REQCERT=never ldapsearch -H ldaps://127.0.0.1:636 -x -b 'dc=auth,dc=com' -s base

On more complex queries, it pays off even more. As an example, basicly every query that SSSD emits (and freeipa by proxy) turns out to be really ineffecient, and causes a ton of issues in how we internally handle it in 389 (heavily related to idllistmaxscan), but the same queries on kanidm (which uses the optimiser and idl range) has zero issues.

The classic example is:

'(&(objectClass=person)(uid=foo))'

Imagine we have 100,000 users. In 389, we'll execute the query by loading the objectClass '=person' eq index, we'll scan it up to idllistmaxscan (2000 I think?) and then "throw away" the results, and then we skip to uid=foo, and return that as it's idl is len 1 so we are below filter test max.

On kani, because we have an optimiser, we'll re-arrange the query to '(&(uid=foo)(objectClass=person))', meaning we only load uid=foo, and we shortcut return.

In the case we do something like '(objectClass=person)', on 389, this becomes a full table scan because of idllistmaxscan, meaning we get ALLIDS as the result since no other terms are present. On kani we return an exact idl of the entries because we have these in the compressed range for, and it's smaller to cache these.

Kani also gets a speed up because we have an idl cache too which 389 doesn't.

So yeah, there are lots of things here that come into play ....

mreynolds389 commented 2 years ago

@Firstyear sounds like you need to finish the optimizer work you did in 389...

Firstyear commented 2 years ago

It was reverted here:

commit 14a10a34b3ff10d5146d73e67aa54c8945cfa37d
Author: Mark Reynolds <mreynolds@redhat.com>
Date:   Fri Aug 24 16:24:21 2018 -0400

    Revert "Ticket 49372 - filter optimisation improvements for common queries"

    This reverts commit 4cd1a24b3ce88968ff5f9a2b87efdc84dee176da.

commit 104968b67bbfbc89cf5311c7ca7be973dd86baed
Author: Mark Reynolds <mreynolds@redhat.com>
Date:   Fri Aug 24 16:21:32 2018 -0400

    Revert "Ticket 49432 - filter optimise crash"

    This reverts commit 5c89dd8f9c8eb77c967574412d049d55565bb364.

It looks like https://github.com/389ds/389-ds-base/issues/3132 was the blocker on this. All the access logs are gone now, so there is no more evidence. I'd say the first step would be to "revert the revert" and then test this, but a lot of the issues were in FreeIPA which I'm not exactly geared up to test with these days. It would be nice if someone could help with this from the FreeIPA/RH side since that's where a lot of the hurdles were.

mreynolds389 commented 2 years ago

@Firstyear - all the logs are in the original pagure ticket:

https://pagure.io/389-ds-base/issue/50073

I have reverted the the "revert" and I will ask IPA test it again, but the old logs are available for analysis in the above ticket. I tried looking through the old logs and could not quickly find what searches misbehaved. Looks like you found it in your debug analysis, but I think it was from your own tests and not IPA test case. Once IPA does their tests (assuming it's still failing) hopefully we can narrow down the results and find the exact searches that were failing. I recall there being several search failures, but I don't recall the details. I'm assuming it's nentries being 0 instead of 1 something like that, but I don't recall which filters were at play.

mreynolds389 commented 2 years ago

HMmm, after reverting the two reverts:

https://github.com/389ds/389-ds-base/pull/5168

The IPA install succeeds. I don't know if that is good or bad because there was definitely a problem in the past. I'm going to have IPA run the full test suite. We'll see how that goes...

In the meantime, @Firstyear can you please review the PR and make sure it matches your original work. I just want to make sure it is still valid, and we are testing the correct "fix". Also the new PR adds back: ldap/servers/slapd/index_subsystem.c That might be a revert artifact and should be undone?

mreynolds389 commented 2 years ago

Additional info. I compared all the complex search filters found in the failing log vs the successful log. And all the filters from the failing log that were present in the success log (note not all the filters were found btw) had identical results. So I'm not sure what was going on. Full IPA tests are being on my PR #5168 and we should have the results today or tomorrow.

mreynolds389 commented 2 years ago

Ok do we have a regression in our acl tests related to targetFilters and "scope one" searches

# py.test misc_test.py::test_only_allow_some_targetattr_two

        conn = UserAccount(topo.standalone, user.dn).bind(PW_DM)
        # aci will allow only mail targetattr but only for cn=Anuj
        account = Accounts(conn, DEFAULT_SUFFIX)
>       assert len(account.filter('(mail=*)', scope=1)) == 5
E       AssertionError: assert 0 == 5
E        +  where 0 = len([])
E        +    where [] = <bound method DSLdapObjects.filter of <lib389.idm.account.Accounts object at 0x7f5c099adf10>>('(mail=*)', scope=1)
E        +      where <bound method DSLdapObjects.filter of <lib389.idm.account.Accounts object at 0x7f5c099adf10>> = <lib389.idm.account.Accounts object at 0x7f5c099adf10>.filter

misc_test.py:265: AssertionError

ACL logging showed:

[21/Feb/2022:15:47:10.115690814 -0500] - DEBUG - NSACLPlugin - acl_access_allowed - #### conn=2 op=2 binddn="uid=test_user_4,dc=example,dc=com"
[21/Feb/2022:15:47:10.118310156 -0500] - DEBUG - NSACLPlugin - acllist_aciscan_update_scan - Searching AVL tree for update:uid=test_user_0,dc=example,dc=com: container:-1
[21/Feb/2022:15:47:10.120857921 -0500] - DEBUG - NSACLPlugin -     ************ RESOURCE INFO STARTS *********
[21/Feb/2022:15:47:10.123473204 -0500] - DEBUG - NSACLPlugin -     Client DN: uid=test_user_4,dc=example,dc=com
[21/Feb/2022:15:47:10.125854592 -0500] - DEBUG - NSACLPlugin -     resource type:256(search target_DN )
[21/Feb/2022:15:47:10.128299061 -0500] - DEBUG - NSACLPlugin -     Slapi_Entry DN: uid=test_user_0,dc=example,dc=com
[21/Feb/2022:15:47:10.130887291 -0500] - DEBUG - NSACLPlugin -     ATTR: parentid
[21/Feb/2022:15:47:10.133566065 -0500] - DEBUG - NSACLPlugin -     rights:search
[21/Feb/2022:15:47:10.136399172 -0500] - DEBUG - NSACLPlugin -     ************ RESOURCE INFO ENDS   *********
[21/Feb/2022:15:47:10.139094757 -0500] - DEBUG - NSACLPlugin - acl__scan_for_acis - Using ACL Container:0 for evaluation
[21/Feb/2022:15:47:10.141877063 -0500] - DEBUG - acl__ressource_match_aci - entry uid=test_user_0,dc=example,dc=com does match (cn=Anuj) (parentid)
[21/Feb/2022:15:47:10.147809521 -0500] - DEBUG - NSACLPlugin - acl__scan_for_acis - Num of ALLOW Handles:0, DENY handles:0
[21/Feb/2022:15:47:10.150705168 -0500] - DEBUG - NSACLPlugin - print_access_control_summary - conn=2 op=2 (main): Deny search on entry(uid=test_user_0,dc=example,dc=com).attr(parentid) to uid=test_user_4,dc=example,dc=com: no aci matched the resource

If I add "parentid" to the allowed target attributes it allows the test to pass:

Domain(topo.standalone, DEFAULT_SUFFIX).\
        replace("aci", '(target="ldap:///{}") (targetattr="mail||objectClass||parentid")'
                       '(targetfilter="cn=Anuj") (version 3.0; acl "{}"; '
                       'allow (compare,read,search) '
                       '(userdn = "ldap:///anyone"); )'.format(DEFAULT_SUFFIX, request.node.name))

Looking at the patch we are modifying the filter internally and adding parentid in create_onelevel_filter() during build_candidate_list(). I can confirm without the patch "parentid" is not being evaluated in the ACL code for this test case.

tbordaz commented 2 months ago

Looking like an interesting idea but impacting a sensitive part of code. There is no evidence of systematic gain. We suspect such implement can increase as well as decrease performance depending on dataset and filter. No benefit and risks => closing the ticket Please reopen if needed

Firstyear commented 2 months ago

There is no evidence of systematic gain. We suspect such implement can increase as well as decrease performance depending on dataset and filter.

You're welcome to close this, but do not misrepresent mine and Ludwig's work - there is a proven performance improvement, in the general case 400% improvement.

We implemented these in Kanidm and we have significantly faster filter execution as a result. Testing and benchmarking was carried out that replicated all the filter evaluation types in 389-ds and shows that there is always an improvement. There is also a reduction in disk usage and memory usage (allowing better cache performance). It reduces the number of calls to re-alloc, it reduces malloc pressure and fragmentations, it improves cpu cache behaviour.

From the readme https://github.com/kanidm/idlset/?tab=readme-ov-file#idlset :

""" For example, this means union (or), intersection (and) and not operations on sets. In the best case, speed ups of 15x have been observed with the general case performing approximately 4x faster that a Vec based implementation. """

This has flow on effects to improving memberOf execution and more.

Again - there is an improvement, please do not misrepresent my work.

tbordaz commented 2 months ago

Hi @Firstyear , thanks for your feedback and clarifications.

I think we can convert 389DS idlist into a start_range/mask. When iterating candidates matching a given key, instead of appending an ID, we may retrieve/update/create one of the start_range/mask. Then later do logical operation of those tuples. It looks not immediate but doable. Regarding the expected benefit/loss, it is a precious feedback that you observed systematic gain (around 400%). It is an excellent justification for this RFE. I still have concern how to get benefit with large DB (I would expect tuple to contain mainly one value), but the benefit would be on the logical evaluation.

I will reopen this RFE to get a new triage. Thanks again @Firstyear