389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 92 forks source link

PR - Ticket 49372 - filter optimisation improvements for common queries #3311

Closed 389-ds-bot closed 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure Pull-Request: https://pagure.io/389-ds-base/pull-request/50252


Bug Description: Due to the way we apply indexes to searches and the presence of the "filter test threshold" there are a number of queries which can be made faster if they understood the internals of our idl_set and index mechanisms. However, instead of expecting application authors to do this, we should provide it.

Fix Description: In the server we have some cases we want to achieve, and some to avoid:

Knowing these two properties, we can now look at improving filters for queries.

In a common case, SSSD will give us a query which is a union of host cn and sudoHost rules. However, the sudoHost rules are substring searchs that are not able to be indexed - thus the whole filter becomes an unindexed search. IE:

(|(cn=a)(cn=b)(cn= ....)(sudoHost=[]))

So in this case we want to move the substring to the first query so that if it's un-indexed, we fail immediately with ALLIDS rather than opening the cn index.

For intersection, we often see:

(&(objectClass=account)(objectClass=posixAccount)(uid=william))

The issue here is that the idls for account and posixAccount both may contain 100,000 items. Even with idl lookthrough limits, until we start to read these, we don't know if we will exceed that.

A better query is:

(&(uid=william)(objectClass=account)(objectClass=posixAccount))

Because the uid=william index will contain a single item, this put's us below filter test threshold, and we will not open the objectClass indexes.

In fact, in an intersection, it is almost always better to perform simple equalities first:

(&(uid=william)(modifyTimestamp>=...)(sn=br*)(objectClass=posixAccount))

In most other cases, we will not greatly benefit from re-arrangement due to the size of the idls involved we won't hit filter test. IE

(&(modifyTimestamp>=...)(sn=br*)(objectClass=posixAccount))

Would not be significantly better despite and possible arrangement without knowing the content of sn.

So in summary, our rules for improving queries are:

Resolves: #2431 Resolves: #3132

Author: wibrown

Original Review by: elkris, mreynolds389 (Thanks!) Review by: ???

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-01 04:23:13

Other things to note: this fixes a lot of issues with the filter tests. The filter tests would fail even without the optimiser. This passes all tests with and without the filter optimiser active. It likely relies on @elkris onelevel scope fix.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-05 04:13:45

Hey @mreynolds389 Did you want to take a look at this :)

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-12 03:23:45

@mreynolds389 Review reminder please :)

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-03-13 19:45:58

@mreynolds389 Review reminder please :)

I need to get this patch tested by FreeIPA team as they were the ones who found the original regression. I'll look into now...

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-03-13 21:49:24

This patch breaks the freeipa install, gathering logs of successful and failed installs

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-13 23:50:01

Wat, how? The optimiser is disabled ... :S

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-14 00:29:19

Thanks for that, but the logs don't help me much, I think that IPA attempts to cleanup after the failure. I'm going back to the patch to reexamine what's occuring.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-14 01:38:19

rebased onto 9983d780646b18955fea58819a29250931f67ce3

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-14 01:40:57

Hey @mreynolds389, I've updated this with an improvement to how we log the results in access log. The new output is:

[14/Mar/2019:10:37:40.373884600 +1000] conn=1 op=21 SRCH base="dc=example,dc=com" scope=2 filter="(cn:caseExactIA5Match:=ext-test-entry)" attrs=ALL
[14/Mar/2019:10:37:40.374433300 +1000] conn=1 op=21 RESULT err=0 tag=101 nentries=1 etime=0.0000976100 notes=U filter="(|(objectclass=referral)(cn:caseExactIA5Match:=ext-test-entry))"

Notice we now provide the optimised (executed) filter into the RESULT string, for comparison against the SRCH string. I think a single IPA install with this updated patch should be enough to try and diagnose what's going wrong, provided we can isolate the failing query in the access log. Is it possible to get the ipa install log as well when we do this test?

So on reconsideration as well, this patch has the optimiser disabled, but it does alter the behaviour of the referral and parentid construction for filters, which may or may not be related to the issue here.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-03-14 04:10:12

Sorry didn't read everything over, but I had them triple check your patch vs no patch and it always failed after I applied your patch,. So its impacting something even though its disabled... very weird.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-03-14 04:14:29

No problem, this is why I provided extra debugging info in latest patch to try and determine the cause. There is no filter optimisation, but there are changes to the referral and onelevel filter constructions, so I want to see if those are the cause by getting the extra details :)

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-02 02:01:32

@mreynolds389 Did we ever get this build checking with IPA? I'm happy to rebase/update it ...

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-17 02:41:37

Yo @mreynolds389 did we ever get anywhere with IPA on this? This would be a great change for them as it helps with SSSD' bad query generation.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-17 02:50:01

rebased onto 2b4e0029fce85bfe6523cae95e923d7655b1a594

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-04-17 17:10:53

Yo @mreynolds389 did we ever get anywhere with IPA on this? This would be a great change for them as it helps with SSSD' bad query generation.

Sorry @Firstyear I need to put a build together for them to retest. I will try to get to that today...

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-18 03:11:54

So @mreynolds389 helpfully linked me to a BZ where this filter optmiser caused an issue. I have taken the two queries, and then put them into a test search, to see what this would do to re-arrange these.

The original queries are :

working:

"(&(|(usercertificate;binary=0\82\03\AD0\82\02\95\A0\03\02\01\02\02\01\130\0D\06\09\2A\86H\86\F7\0D\01\01\0B\05)(ipaCertMapData=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1))(objectClass=posixAccount)(uid=*)(&(uidNumber=*)(!(uidNumber=0))))"

failing:

"(&(|(usercertificate;binary=0\82\03\AD0\82\02\95\A0\03\02\01\02\02\01\130\0D\06\09\2A\86H\86\F7\0D\01\01\0B\05)(ipaCertMapData=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)(altsecurityidentities=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1))(objectClass=posixAccount)(uid=*)(&(uidNumber=*)(!(uidNumber=0))))"

The results are as follows - note I have trimmed the binary output in these comparisons for ease of viewing, but they do match.

(onelevel, working)
original
        (&
            (|
                (usercertificate;binary=)
                (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
            )
            (objectClass=posixAccount)
            (uid=*)
            (&
                (uidNumber=*)
                (!
                    (uidNumber=0)
                )
            )
        )

optimised
(&
    (parentid=1)
    (|
        (objectclass=referral)
        (&
            (|
                (usercertificate;binary=)
                (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
            )
            (objectClass=posixAccount)
            (uid=*)
            (uidNumber=*)
            (!
                (uidNumber=0)
            )
        )
    )
)
-------------------
(onelevel, failing)
original
        (&
            (|
                (usercertificate;binary=)
                (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
                (altsecurityidentities=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
            )
            (objectClass=posixAccount)
            (uid=*)
            (&
                (uidNumber=*)
                (!
                    (uidNumber=0)
                )
            )
        )

optimised
(&
    (parentid=1)
    (|
        (objectclass=referral)
        (&
            (|
                (usercertificate;binary=)
                (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
                (altsecurityidentities=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
            )
            (objectClass=posixAccount)
            (uid=*)
            (uidNumber=*)
            (!
                (uidNumber=0)
            )
        )
    )
)
-------------------
(subtree, working)
original
    (&
        (|
            (usercertificate;binary=)
            (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
        )
        (objectClass=posixAccount)
        (uid=*)
        (&
            (uidNumber=*)
            (!
                (uidNumber=0)
            )
        )
    )

optimised
(|
    (objectclass=referral)
    (&
        (|
            (usercertificate;binary=)
            (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
        )
        (objectClass=posixAccount)
        (uid=*)
        (uidNumber=*)
        (!
            (uidNumber=0)
        )
    )
)

-------------------
(subtree, failing)
original
    (&
        (|
            (usercertificate;binary=)
            (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
            (altsecurityidentities=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
        )
        (objectClass=posixAccount)
        (uid=*)
        (&
            (uidNumber=*)
            (!
                (uidNumber=0)
            )
        )
    )

optimised
(|
    (objectclass=referral)
    (&
        (|
            (usercertificate;binary=)
            (ipacertmapdata=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
            (altsecurityidentities=X509:<I>O=TESTRELM.TEST,CN=Certificate Authority<S>O=TESTRELM.TEST,CN=ipauser1)
        )
        (objectClass=posixAccount)
        (uid=*)
        (uidNumber=*)
        (!
            (uidNumber=0)
        )
    )
)

It appears that only a single change is occuring, which is the (&) fold bringing (&(uidNumber=*)(!(uidNumber=0))) into the higher scope. If anyone can see anything else wrong here, I'd be keen to know, but I can't observe any other faults in the operation of the optimiser in these queries that would lead to absent results.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-04-18 15:30:49

I think the issue might have been how you changed ACI processing on filters - not the actual candidate list.

Anyway, the current patch passes IPA tests, but the feature is "off". I'm going to do a build with the it "on" and see if it still fails...

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-04-18 20:57:26

I think the issue might have been how you changed ACI processing on filters - not the actual candidate list. Anyway, the current patch passes IPA tests, but the feature is "off". I'm going to do a build with the it "on" and see if it still fails...

It still fails when enabled, just an FYI. Is there any other information you want me to get? Or run it again with a log level? Or debug patch?

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-19 01:34:01

Just the access log is fine (and the ipa install log) - the current branch here has a debug patch added that printes the "optimised" filter after the results so we can compare what transforms occured that may have caused the error. Thank you very much! :)

So if it passes when it's "off", you mean the #define in the filter.c? And it fails when that is #defined? That is "good" because it really narrows down where the fault could be (also also means we can probably commit this with some confidence that we can back out if needed by flipping the #define)

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2019-04-19 02:40:21

Just the access log is fine (and the ipa install log) - the current branch here has a debug patch added that printes the "optimised" filter after the results so we can compare what transforms occured that may have caused the error. Thank you very much! :)

I'll get the access log, no error log?

So if it passes when it's "off", you mean the #define in the filter.c? And it fails when that is #defined?

Yeah I just uncommented the define and it failed the same way it did originally when I had to revert the patch.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-29 00:57:59

@mreynolds389 Great thanks! I have all the info you sent me so I'll investigate this soon! (I have so much email after my tropical week off).

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-29 04:23:01

https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org/thread/545CSBD4OY3RGUDTSL37J7FYXFEAW63H/ <<-- write here of investigation and possible conclusions.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-30 03:58:55

rebased onto e2048f237f53809370ec51e3e91b8bed8fdc0258

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-30 04:03:17

@mreynolds389 I have updated this patch to harden the filter bypass flag application in subtree candidates, and improved logging of LOG_FILTER. The filter opt is disabled.

So I'm wondering what's the right way to proceed here. I think we could commit this into master, given that the optimiser is disabled, and that's what causes the problem. Then we at least have the "good" parts of the patch related to the tombstone and referral application cleaned up. We also get some better debuging of filter application for this.

Who have you been working with in RH QE about this? I think I need to know exactly what query IPA is failing with, and the exact conditions, maybe better, we need LOG_FILTER enabled too.

What do you think?

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-30 04:57:44

1 new commit added

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2019-04-30 09:48:32

The benefit of the patch is perf improvement (10 times faster) but with specific conditions (filters and entries) The drawbacks are that it is difficult to review (low level routine with multiple potential impact) and test as many regression were discovered after QE

IMHO it worth getting the improvement in master if the known problematic use cases are fixed. here is a list of them mentioned in the ticket (but it may exist others not listed in the ticket :( )

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-04-30 11:27:51

@tbordaz Part of the issue is that no one has provided sufficient information about what query is going wrong, only that something does.

I have attempted an ipa install with a debug build which fails, for the reason:

2019-04-30T05:08:35Z DEBUG stderr=kdb5_util: Kerberos database constraints violated while adding entries to the database

There is no further krb logs or information provided about this matter (which I find quite stunning ....).

However, it appears that this entry already exists, and KRB is "double-adding" it, which seems to be the fault of the extra debugging info slowing down the process. So I'm probably going to try to speed up the errorlog to that the install can proceed....

EDIT: ramdisk fixes this right up ;)

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-01 00:48:56

The ipa-server-install command was successful

So with a debug build I can successfully install IPA. I have kept all associated logs.

@mreynolds389 or @tbordaz who in RH QE from IPA has been reporting these failures recently when they have attempted installations? I'd like to contact them to ask about what steps they took, and their method so I can attempt to reproduce.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-02 03:52:08

1 new commit added

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-02 04:09:11

1 new commit added

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-02 04:28:32

1 new commit added

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-02 05:25:33

1 new commit added

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2019-05-02 09:54:12

@Firstyear, we need to wait (next week) for @mreynolds389 to confirm which failures he discovered #3311#comment-83964.

Have you been able to verify the patch against the known problematic use cases (#3311#comment-84875) ?

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-02 11:21:00

@tbordaz I've done one better! I got in touch with someone at RH QE and have reproduction cases. I'm currently analysing the system now, and may have finally had a break through on this. I'm working a bit late to finalise some of the report on the matter. :)

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-02 12:16:04

1 new commit added

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-02 13:28:07

Hey @tbordaz

Before I sign out for the night, I believe the issue is that the filter optimiser is causing the query to evaluate correctly - specifically, the certmap query fails due to a lack of aci allowing read to altsecurityidentities, which causes the OR condition to fail etc.

More curious is why this does not fail when the optimiser is off - I suspect that there may be some issue related to filter test bypassing, so I'm going to suggest we set the filter test bypass to "off" with an IPA install to determine if that also affects it.

I have more logging and details related to this, which I'll write up nicer for the ticket tomorrow.

Thanks :)

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-03 03:38:09

Hi everyone,

Here is my analysis of the issue that has occurred with filter optimisation and why the queries were reporting failure.

The main issue was related to a query in the ipa certificate mapping functionality that interacts with SSSD. The query in question that was failing was:

(&
    (|
        (usercertificate;binary=)
        (ipaCertMapData=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1)
        (altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1)
    )
    (objectClass=posixAccount)
    (uid=*)
    (&
        (uidNumber=*)
        (!
            (uidNumber=0)
        )
    )
)

Please note this has been formatted for ease of understanding.

I have added a large amount of debbuging info into a build, and using a ramdisk for logging to ensure adequate performance for the error log. I also set the server to single threaded to guarantee that the queries were linearised in logs, IE no interleaved or interfering data.

In this log I have noted that the filter optimisers behaviour is correct - the translated query is:

(|
    (objectClass=referral)
    (&
        (|
            (usercertificate;binary=...)
            (ipaCertMapData=x509:<i>o=dev.blackhats.net.au,cn=certificate authority<s>o=dev.blackhats.net.au,cn=ipauser1)
            (altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1)
        )
        (objectClass=posixaccount)
        (uid=*)
        (uidNumber=*)
        (!
            (uidNumber=0)
        )
    )
)

This is logically equivalent.

I have also noted the correct behaviour of the filter being applied. The log section here is relevant - note that I have partially annotated it for ease of understanding:

[02/May/2019:14:21:53.791733201 +1000] - DEBUG - filter_candidates_ext -        OR            (| ... )
[02/May/2019:14:21:53.791765725 +1000] - DEBUG - list_candidates - => 0xa1
[02/May/2019:14:21:53.791781631 +1000] - DEBUG - filter_candidates_ext -        EQUALITY      (objectClass=referral)
[02/May/2019:14:21:53.791798745 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - =>
[02/May/2019:14:21:53.791931255 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - => 0
[02/May/2019:14:21:53.792128141 +1000] - DEBUG - filter_candidates_ext - <= IDL length(0)     (objectClass=referral) -> 0 candidates
[02/May/2019:14:21:53.792164718 +1000] - DEBUG - list_candidates - NOT applying union shortcut
[02/May/2019:14:21:53.792180518 +1000] - DEBUG - filter_candidates_ext -        AND           (& ... )
[02/May/2019:14:21:53.792195995 +1000] - DEBUG - list_candidates - => 0xa0
[02/May/2019:14:21:53.792359316 +1000] - DEBUG - filter_candidates_ext -        OR            (| ... )
[02/May/2019:14:21:53.792408848 +1000] - DEBUG - list_candidates - => 0xa1
[02/May/2019:14:21:53.792435839 +1000] - DEBUG - filter_candidates_ext -        EQUALITY      (userCertificate;binary=...)
[02/May/2019:14:21:53.792587334 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - =>
[02/May/2019:14:21:53.792623904 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - => 0
[02/May/2019:14:21:53.792980957 +1000] - DEBUG - filter_candidates_ext - <= IDL length(0)     (userCertificate;binary=...) -> 0 candidates
[02/May/2019:14:21:53.793015546 +1000] - DEBUG - list_candidates - NOT applying union shortcut      NO ALLIDS present, continue
[02/May/2019:14:21:53.793030057 +1000] - DEBUG - filter_candidates_ext -        EQUALITY      (ipaCertMapData=...)
[02/May/2019:14:21:53.793046839 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - =>
[02/May/2019:14:21:53.793155068 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - => 0
[02/May/2019:14:21:53.793200212 +1000] - DEBUG - filter_candidates_ext - <= IDL length(486)   (ipaCertMapData=...) -> ALLIDS
[02/May/2019:14:21:53.793214719 +1000] - DEBUG - list_candidates - union shortcut                   ALLIDS present, OR must fail as ALLIDS
[02/May/2019:14:21:53.793232658 +1000] - DEBUG - list_candidates - <= 0xa1 IDL length(486)
[02/May/2019:14:21:53.793331383 +1000] - DEBUG - filter_candidates_ext - <= IDL length(486)   (| ... ) -> ALLIDS
[02/May/2019:14:21:53.793388053 +1000] - DEBUG - list_candidates - NOT applying intersection shortcut (filter test threshold)   (& ... ) cand len > filter test, do not shortcut
[02/May/2019:14:21:53.793423332 +1000] - DEBUG - filter_candidates_ext -        EQUALITY      (objectClass=posixAccount)
[02/May/2019:14:21:53.793622612 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - =>
[02/May/2019:14:21:53.793790387 +1000] - DEBUG - slapi_attr_assertion2keys_ava_sv - => 0
[02/May/2019:14:21:53.793885354 +1000] - DEBUG - filter_candidates_ext - <= IDL length(2)     (objectClass=posixAccount) -> 2 candidates
[02/May/2019:14:21:53.794038943 +1000] - DEBUG - list_candidates - intersection shortcut (filter test threshold)    (& ... ) a cand len is < filter test, shortcut return partial set. Set do not bypass flag.
[02/May/2019:14:21:53.814965251 +1000] - DEBUG - list_candidates - <= 0xa0 IDL length(2)     (& ... ) -> 2 candidates (partial)
[02/May/2019:14:21:53.815077254 +1000] - DEBUG - filter_candidates_ext - <= IDL length(2)
[02/May/2019:14:21:53.815102562 +1000] - DEBUG - list_candidates - NOT applying union shortcut
[02/May/2019:14:21:53.815650798 +1000] - DEBUG - list_candidates - <= 0xa1 IDL length(2)     (| ... ) -> 2 candidates (partial)
[02/May/2019:14:21:53.815665087 +1000] - DEBUG - filter_candidates_ext - <= IDL length(2)

[02/May/2019:14:21:53.816754671 +1000] - DEBUG - ldbm_back_next_search_entry_ext - Applying filter test     do not bypass flag is set, enforce filter test on partial candidate set.

At this point the partial candidate set has two items = uid=admin and uid=ipauser1

The filter test now begins, and while applying to uid=admin, the following failure occurs:

[02/May/2019:20:53:21.834989302 +1000] - DEBUG - slapi_vattr_filter_test_ext_internal - => processing filt -> (|(objectClass=referral)(&(|(usercertificate;binary=0<82>^D*0<82>^C^R<A0>^C^B^A^B^B^A^K0^M^F      *<86>H<86><F7>^M^A^A^K^E)(ipaCertMapData=x509:<i>o=dev.blackhats.net.au,cn=certificate authority<s>o=dev.blackhats.net.au,cn=ipauser1)(altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1))(objectClass=posixaccount)(uid=*)(uidNumber=*)(!(uidNumber=0)))) on uid=ipauser1,cn=users,cn=accounts,dc=dev,dc=blackhats,dc=net,dc=au
...
[02/May/2019:20:53:21.845422033 +1000] - DEBUG - slapi_vattr_filter_test_ext_internal - => processing filt -> (altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1) on uid=ipauser1,cn=users,cn=accounts,dc=dev,dc=blackhats,dc=net,dc=au
...
[02/May/2019:20:53:21.847993991 +1000] - DEBUG - slapi_vattr_filter_test_ext_internal - <= filt 50 -> (altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1) on uid=ipauser1,cn=users,cn=accounts,dc=dev,dc=blackhats,dc=net,dc=au
...
[02/May/2019:20:53:21.861400976 +1000] - DEBUG - slapi_vattr_filter_test_ext_internal - <= filt -1 -> (|(objectClass=referral)(&(|(usercertificate;binary=0<82>^D*0<82>^C^R<A0>^C^B^A^B^B^A^K0^M^F      *<86>H<86><F7>^M^A^A^K^E)(ipaCertMapData=x509:<i>o=dev.blackhats.net.au,cn=certificate authority<s>o=dev.blackhats.net.au,cn=ipauser1)(altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1))(objectClass=posixaccount)(uid=*)(uidNumber=*)(!(uidNumber=0)))) on uid=ipauser1,cn=users,cn=accounts,dc=dev,dc=blackhats,dc=net,dc=au
...
==> /var/log/dirsrv/slapd-DEV-BLACKHATS-NET-AU/access <==
[02/May/2019:20:53:21.861772973 +1000] conn=27 op=13 RESULT err=0 tag=101 nentries=0 etime=0.0086440389 notes=U filter="(|(objectclass=referral)(&(|(usercertificate;binary=0<82>^D*0<82>^C^R<A0>^C^B^A^B^B^A^K0^M^F    *<86>H<86><F7>^M^A^A^K^E)(ipaCertMapData=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1)(altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1))(objectClass=posixAccount)(uid=*)(uidNumber=*)(!(uidNumber=0))))"

This is the important part - 'filt 50' is an rc of 50 - insufficient access. It is now important to understand who has bound:

[02/May/2019:20:52:55.275409924 +1000] conn=27 fd=106 slot=106 connection from 172.24.15.14 to 172.24.15.14
[02/May/2019:20:52:55.282769356 +1000] conn=27 op=0 SRCH base="" scope=0 filter="(objectClass=*)" attrs="* altServer namingContexts supportedControl supportedExtension supportedFeatures supportedLDAPVersion supportedSASLMechanisms domaincontrollerfunctionality defaultnamingcontext lastusn highestcommittedusn aci"
[02/May/2019:20:52:55.308091966 +1000] conn=27 op=0 RESULT err=0 tag=101 nentries=1 etime=0.0027247181 filter="(objectClass=*)"
[02/May/2019:20:52:55.669078776 +1000] conn=27 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
[02/May/2019:20:52:55.698373098 +1000] conn=27 op=1 RESULT err=14 tag=97 nentries=0 etime=0.0029494522, SASL bind in progress
[02/May/2019:20:52:55.698801574 +1000] conn=27 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI
[02/May/2019:20:52:55.715020675 +1000] conn=27 op=2 RESULT err=14 tag=97 nentries=0 etime=0.0016721784, SASL bind in progress
[02/May/2019:20:52:55.715256927 +1000] conn=27 op=3 BIND dn="" method=sasl version=3 mech=GSSAPI
[02/May/2019:20:52:55.759796551 +1000] conn=27 op=3 RESULT err=0 tag=97 nentries=0 etime=0.0044839201 dn="fqdn=test-c.dev.blackhats.net.au,cn=computers,cn=accounts,dc=dev,dc=blackhats,dc=net,dc=au"

It is the machine account of the DC.

When performing the following search as uid=admin, this works:

ldapsearch '(&(|(altsecurityidentities=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1)(ipaCertMapData=X509:<I>O=DEV.BLACKHATS.NET.AU,CN=Certificate Authority<S>O=DEV.BLACKHATS.NET.AU,CN=ipauser1))(objectClass=posixAccount)(uid=*)(uidNumber=*)(!(uidNumber=0)))'

As a result, I can confirm, that the issue is not within the filter optimiser - the issue is the lack of access control allow machine accounts to read the altsecurityidentities attribute.

Now there are some important follow ups here:

As this issue has been open for such a long time I think it's important we follow up on the various, and complex causes that have delayed a proper analysis of this issue. For example, had we of implemented point 1, and rejected the query, we would never have seen this query leave a development environment, and likely would have been identified faster as failing. Similar, proper logging and confidence to enable it, would have allow IPA QE to much quicker diagnose the issue's location to the IPA ACI's, rather than being affected by the DS query optimiser. And finally, our own confidence to enable and understand log data as a team, to make better determinations.

As a result of this issue I plan to open a number of issues:

These changes, would have prevented and assisted this issue to be resolved much sooner, and much faster.

Thanks,

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-03 03:59:27

Here are all relevant and opened issues.

3408 <<-- filter schema checking

3409 <<-- logging structure

2474 <<-- log performance

3407 <<-- filter bypass default to off.

https://pagure.io/freeipa/issue/7933 <<-- IPA unindexed attrs https://pagure.io/freeipa/issue/7932 <<-- IPA missing altsecurityidentities

I will update this patch to cleanup much of the debugging, and to add the libglobs.c flag for enable/disable of the filter rather than being a compile flag.

389-ds-bot commented 4 years ago

Comment from abbra at 2019-05-03 08:37:00

@Firstyear thanks for the investigation. I think some of your suggestions are sound but not all.

I want to focus on your proposal to remove unknown schema elements from all supporting applications. This is pretty much impossible to achieve and expect.

In case of certificate mapping rules FreeIPA, the same set of rules can apply to multiple sources. SSSD runs the resulting search against all its databases where a user can be defined. In practice, it means that the same filter is applied to both IPA LDAP and AD LDAP in case of a trust to active directory established in FreeIPA.

altSecurityIdentities is an attribute defined by AD LDAP schema, not FreeIPA. While we can add the element to our schema, it makes little difference: no objects in IPA LDAP are going to have this attribute defined. Sure, it will help 389-ds to short-circuit the search thanks to indexing of this element and I have submitted https://github.com/freeipa/freeipa/pull/3110

I don't think it is realistically possible to enforce that all attributes of a filter are present in schema and reject the filter otherwise. RFC 4511 section 4.5.1.7. states that

Servers MUST NOT return errors if attribute descriptions or matching rule ids are not recognized, assertion values are invalid, or the assertion syntax is not supported. More details of filter processing are given in Clause 7.8 of [X.511].

and Clause 7.8 of https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-X.511-201610-I!!PDF-E&type=items has this language:

Any assertion about the values of such an attribute is only defined if the AttributeType is known by the evaluating mechanism, the purported AttributeValue(s) conforms to the attribute syntax defined for that attribute type, the implied or indicated matching rule is applicable to that attribute type, and (when used) a presented matchValue conforms to the syntax defined for the indicated matching rules. When these conditions are not met, the FilterItem shall evaluate to the logical value UNDEFINED. An assertion which is defined by these conditions additionally evaluates to UNDEFINED if it relates to an attribute value and the attribute type is not present in an attribute against which the assertion is being tested. An assertion which is defined by these conditions and relates to the presence of an attribute type evaluates to FALSE.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-03 10:33:25

@Firstyear thanks for the investigation. I think some of your suggestions are sound but not all. I want to focus on your proposal to remove unknown schema elements from all supporting applications. This is pretty much impossible to achieve and expect.

It is a security risk for DOS due to the fact it can never be indexed. That alone is worth following up, let alone human interaction elements.

In case of certificate mapping rules FreeIPA, the same set of rules can apply to multiple sources. SSSD runs the resulting search against all its databases where a user can be defined. In practice, it means that the same filter is applied to both IPA LDAP and AD LDAP in case of a trust to active directory established in FreeIPA. altSecurityIdentities is an attribute defined by AD LDAP schema, not FreeIPA. While we can add the element to our schema, it makes little difference: no objects in IPA LDAP are going to have this attribute defined. Sure, it will help 389-ds to short-circuit the search thanks to indexing of this element and I have submitted https://github.com/freeipa/freeipa/pull/3110

If it's in 389, define it in schema else you will ALWAYS have this search become un-indexed and full table scan, and fail security checks. It is not possible for you to leave this outside of schema, and have it perform correctly.

If it's not going to be present in 389-ds, don't search for it, and then you won't care if the filter enforces attributes.

I don't think it is realistically possible to enforce that all attributes of a filter are present in schema and reject the filter otherwise. RFC 4511 section 4.5.1.7. states that

Servers MUST NOT return errors if attribute descriptions or matching rule ids are not recognized, assertion values are invalid, or the assertion syntax is not supported. More details of filter processing are given in Clause 7.8 of [X.511].

and Clause 7.8 of https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-X.511-201610-I!!PDF-E&type=items has this language:

Any assertion about the values of such an attribute is only defined if the AttributeType is known by the evaluating mechanism, the purported AttributeValue(s) conforms to the attribute syntax defined for that attribute type, the implied or indicated matching rule is applicable to that attribute type, and (when used) a presented matchValue conforms to the syntax defined for the indicated matching rules. When these conditions are not met, the FilterItem shall evaluate to the logical value UNDEFINED. An assertion which is defined by these conditions additionally evaluates to UNDEFINED if it relates to an attribute value and the attribute type is not present in an attribute against which the assertion is being tested. An assertion which is defined by these conditions and relates to the presence of an attribute type evaluates to FALSE.

I do not believe we should be so narrow in our view, when an rfc defined many years ago, is limiting us and presenting security risks, and performance risks. Had we of enforced this behavior, the certmap flaws would never have reached a production codebase, would have told our developers that a mistake was made, and would have aided us as humans to make informed decisions.

Additionally, it's a false argument to claim that any server is LDAP complaint, besides OpenLDAP - we certainly have our share of flaws and quirks that make us odd, and so to bring in the RFC here for "correctness" is not a really strong argument here IMO.

I think you also presenting your argument in a way that does not consider human behavior, and I do not agree with it. It's my job to do what is best for the 389-ds project. As a result, my recommendation remains - we should act in a manner that respects and helps humans make correct, informed decisions, and adds constraints to prevent security and performance risks. If we were to implement this feature, IPA would have the choice to disable it to maintain compatability - and that's your choice you are free to make.

389-ds-bot commented 4 years ago

Comment from abbra at 2019-05-03 11:03:47

389-ds is free to ignore any user-provided input, including filters which reference unknown attributes. I think this doesn't contradict RFC 4511. However, it is not reasonable to directly contradict RFC 4511 by returning an error to such search. May be you were implying a different response but with the statement

389DS should enforce that all attributes of a filter are present in schema, and if not, the filter is rejected.

is what I read out of it.

From FreeIPA side we have no control over which certmap rules users would add to the system. We can suggest but the reality is that FreeIPA here is a centralized storage of those rules and SSSD is applying them, not FreeIPA. It might be possible to differentiate such searches on SSSD side between different domains and not sending it to search at IPA LDAP server, I don't know. This doesn't change a fact that such search might happen. altSecurityIdentities is just one of such attributes that used on AD side, there could be others. In any way, it is beyond what FreeIPA can control.

389-ds-bot commented 4 years ago

Comment from abbra at 2019-05-03 11:10:09

@sbose, is it possible to separate inside SSSD which certmap rule filters applied to what source domain? See above -- if we can avoid searching in IPA LDAP for attributes that clearly only apply on AD side, we could avoid the issues described in this 389-ds ticket.

389-ds-bot commented 4 years ago

Comment from sbose at 2019-05-03 11:25:23

Hi,

SSSD does not use altSecurityIdentities by default. It is coming from the certificate mapping and matching rules. The rules can be defined per domain with the help of the --domain option of ipa certmap-add to ipa certmap-mod. If the domain is missing the rule applies to the IPA domain.

HTH

bye, Sumit

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2019-05-03 11:57:14

@Firstyear, sorry to come late in that thread. Thanks for your great investigations.

ldap client are free to use any attribute in their filter. Now I am unsure what matching rule will be picked up if the attribute is unknown. The evaluation of the filter for building the candidate list leads to unindex with that unknown component but it is legal to put unindex component. The use of the filter for the candidate matching can hide some candidate if rights are not granted. I think it is allowed to specify unknown attribute in aci.

I agree that if the search fails because of the missing ACI, adding the ACI looks to appropriate fix.

Just a question, is this test successful without your patch ?

389-ds-bot commented 4 years ago

Comment from abbra at 2019-05-03 12:16:53

I've updated https://github.com/freeipa/freeipa/pull/3110 to provide following changes:

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-07 02:01:40

I've updated https://github.com/freeipa/freeipa/pull/3110 to provide following changes:

add altSecurityIdentities attribute to IPA LDAP schema allow authenticated binds to read altSecurityIdentities attribute index both ipacertmapdata and altsecurityidentities add a check to ipa certmaprule-{add,mod} that will enforce adding altSecurityIdentities filter only to trusted Active Directory domains.

Great, this sounds like it will resolve a lot of the issues.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-07 02:05:03

@Firstyear, sorry to come late in that thread. Thanks for your great investigations. ldap client are free to use any attribute in their filter. Now I am unsure what matching rule will be picked up if the attribute is unknown. The evaluation of the filter for building the candidate list leads to unindex with that unknown component but it is legal to put unindex component. The use of the filter for the candidate matching can hide some candidate if rights are not granted. I think it is allowed to specify unknown attribute in aci.

Well, there are two options.

There is a risk of breaking some installs that use extensibleObject though, so this behaviour would certainly need a flag in cn=config. But for the "correct" use case, this is the best action.

I agree that if the search fails because of the missing ACI, adding the ACI looks to appropriate fix. Just a question, is this test successful without your patch ?

The test was also failing without the filter optimisation patch, so I think that people were pointing the finger at filter opt when the failure may have been elsewhere. I'm still trying to understand the past conditions, but I suspect some interaction with the previously failing onelevel search could be related. It's hard as I don't have the debug info needed in previous versions :(

This is why I think the best option for getting this feature advanced is to make a cn=config flag, and for now, default it to "off" and then gain confidence that the other changes are valid, then default to "on" later - it gives us an easy rollback and work around too.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-07 06:38:19

rebased onto 2fae03ab15a03da153e21367e5ac01de41f356b8

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2019-05-07 06:39:40

Okay! This update cleans up the debugging from previous, adds a cn=config switch that currently defaults to off, and basically just tidies up the patch. I will propose the default to be ON in a seperate patch once IPA has landed the necessary changes and I have tested them.

Please review! (ping @mreynolds389 and @tbordaz especially about this :) )

389-ds-bot commented 4 years ago

Comment from lkrispen (@elkris) at 2019-05-08 16:50:00

I wiil (and want to) look into this, but can only do it next week, stay tuned