389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
212 stars 91 forks source link

RI plugin - Optimize the search filters if the MODRDN targets a leaf entry. #4173

Open 389-ds-bot opened 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/51120


Issue Description

The Referential Integrity plugin triggers internal searches when an entry is deleted or renamed. For a deletion the searches use exact filters. During a renaming, those searches use substring filters ( to adjust references to potential children I guess? )

In some cases ( large databases ), the substring searches could consume all the available DB locks ( "ERR - libdb - BDB2055 Lock table is out of available lock entries" message in the errors log ) especially if the relevant substring indexes are missing.

Would it be possible to optimize the processing by using exact searches if the MODRDN operation targets a leaf entry?

Thanks, Têko.

Package Version and Platform

$ cat /etc/redhat-release Red Hat Enterprise Linux release 8.1 (Ootpa) $ $ rpm -qa | grep 389-ds-base 389-ds-base-libs-1.4.2.12-2.module+el8dsrv+6428+6e54c518.x86_64 389-ds-base-1.4.2.12-2.module+el8dsrv+6428+6e54c518.x86_64 $

Steps to reproduce

  1. Activate the Referential Integrity plugin
  2. Enable the logging of internal operations
  3. Change the RDN of an entry
  4. Check the access log

Actual results

The searches use substring filters even for a leaf entry.

Expected results

Use exact searches for leaf entries.

389-ds-bot commented 4 years ago

Comment from tmihinto at 2020-05-28 13:35:18

1) MODRDN operation:

$ ldapmodify -x -D"cn=Directory Manager" -W -hlocalhost -p1389
Enter LDAP Password:
dn: uid=tmorris,ou=People,dc=example,dc=com
changetype: modrdn
newrdn: uid=tedmorris
deleteOldRDN: 0

modifying rdn of entry "uid=tmorris,ou=People,dc=example,dc=com"

$

Log excerpt:

[28/May/2020:12:13:34.908064239 +0200] conn=5 op=1 MODRDN dn="uid=tmorris,ou=People,dc=example,dc=com" newrdn="uid=tedmorris" newsuperior="(null)"
[28/May/2020:12:13:34.909740535 +0200] conn=5 (Internal) op=1(1)(1) SRCH base="uid=tmorris,ou=people,dc=example,dc=com" scope=0 filter="(|(objectclass=*)(objectclass=ldapsubentry))" attrs=ALL
[28/May/2020:12:13:34.909947331 +0200] conn=5 (Internal) op=1(1)(1) RESULT err=0 tag=48 nentries=1 etime=0.000295258
...
[28/May/2020:12:13:34.912786880 +0200] conn=5 (Internal) op=1(6)(1) SRCH base="dc=example,dc=com" scope=2 filter="(member=*uid=tmorris,ou=people,dc=example,dc=com)" attrs="member"
[28/May/2020:12:13:34.913006750 +0200] conn=5 (Internal) op=1(6)(1) RESULT err=0 tag=48 nentries=0 etime=0.000226900 notes=U details="Partially Unindexed Filter
[28/May/2020:12:13:34.913030318 +0200] conn=5 (Internal) op=1(7)(1) SRCH base="dc=example,dc=com" scope=2 filter="(uniquemember=*uid=tmorris,ou=people,dc=example,dc=com)" attrs="uniquemember"
[28/May/2020:12:13:34.913150509 +0200] conn=5 (Internal) op=1(7)(1) RESULT err=0 tag=48 nentries=0 etime=0.000127963 notes=U details="Partially Unindexed Filter
[28/May/2020:12:13:34.913171445 +0200] conn=5 (Internal) op=1(8)(1) SRCH base="dc=example,dc=com" scope=2 filter="(owner=*uid=tmorris,ou=people,dc=example,dc=com)" attrs="owner"
[28/May/2020:12:13:34.913281110 +0200] conn=5 (Internal) op=1(8)(1) RESULT err=0 tag=48 nentries=0 etime=0.000115952 notes=U details="Partially Unindexed Filter
[28/May/2020:12:13:34.913302039 +0200] conn=5 (Internal) op=1(9)(1) SRCH base="dc=example,dc=com" scope=2 filter="(seeAlso=*uid=tmorris,ou=people,dc=example,dc=com)" attrs="seeAlso"
[28/May/2020:12:13:34.913427485 +0200] conn=5 (Internal) op=1(9)(1) RESULT err=0 tag=48 nentries=0 etime=0.000131536 notes=U details="Partially Unindexed Filter
...
[28/May/2020:12:13:34.917078589 +0200] conn=5 op=1 RESULT err=0 tag=109 nentries=0 etime=0.009786836

2) DELETE operation:

$  ldapdelete -x -D"cn=Directory Manager" -W -hlocalhost -p1389 "uid=tedmorris,ou=people,dc=example,dc=com"
Enter LDAP Password:
$

Log excerpt:

[28/May/2020:12:57:09.437879191 +0200] conn=8 op=1 DEL dn="uid=tedmorris,ou=people,dc=example,dc=com"
...
[28/May/2020:12:57:09.468235333 +0200] conn=8 (Internal) op=1(6)(1) SRCH base="dc=example,dc=com" scope=2 filter="(member=uid=tedmorris,ou=people,dc=example,dc=com)" attrs="member"
[28/May/2020:12:57:09.472872178 +0200] conn=8 (Internal) op=1(6)(1) RESULT err=0 tag=48 nentries=0 etime=0.004646881
[28/May/2020:12:57:09.472910498 +0200] conn=8 (Internal) op=1(7)(1) SRCH base="dc=example,dc=com" scope=2 filter="(uniquemember=uid=tedmorris,ou=people,dc=example,dc=com)" attrs="uniquemember"
[28/May/2020:12:57:09.487967755 +0200] conn=8 (Internal) op=1(7)(1) RESULT err=0 tag=48 nentries=0 etime=0.015066929
[28/May/2020:12:57:09.488037130 +0200] conn=8 (Internal) op=1(8)(1) SRCH base="dc=example,dc=com" scope=2 filter="(owner=uid=tedmorris,ou=people,dc=example,dc=com)" attrs="owner"
[28/May/2020:12:57:09.499388894 +0200] conn=8 (Internal) op=1(8)(1) RESULT err=0 tag=48 nentries=0 etime=0.011385535
[28/May/2020:12:57:09.499447418 +0200] conn=8 (Internal) op=1(9)(1) SRCH base="dc=example,dc=com" scope=2 filter="(seeAlso=uid=tedmorris,ou=people,dc=example,dc=com)" attrs="seeAlso"
[28/May/2020:12:57:09.504093457 +0200] conn=8 (Internal) op=1(9)(1) RESULT err=0 tag=48 nentries=0 etime=0.004682626
...
[28/May/2020:12:57:09.527750518 +0200] conn=8 op=1 RESULT err=0 tag=107 nentries=0 etime=0.090186133
389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2020-05-28 13:59:22

Thanks for the detailed investigations/findings. Just for curiosity, is uniquemember indexed in substrings ? A possibility is that with many entries 'uid=...', it will end up with a substring keys '^ui' and 'uid' that will match many entries and lookup many pages under the MODRDN txn.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2020-05-28 13:59:23

Metadata Update from @tbordaz:

389-ds-bot commented 4 years ago

Comment from tmihinto at 2020-05-28 14:57:41

Hi Thierry,

In the customer case where the DB locks issue happened, there was initially no substring index for the attribute "uniquemember". Once they added all the needed substring indexes the DB locks usage was low. They had more than 100K members in a static group.

Regards, Têko.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2020-05-28 15:45:10

On a side note, our default indexing for all the RI attributes (member, seeAlso, owner, uniquemember) is just equality. So out of the box the RI plugin will cause this issue for modrdn's. However, substring indexing is an expensive type to add by default. I do not see an easy way around this for modrdn and RI, it really looks like we need to keep the substring search, but I'm hesitant to add substring as a default index type for those attributes.

Maybe we could just log a message if the RI plugin detects the index type is missing? Then the customer would know that they need to add the index.

389-ds-bot commented 4 years ago

Comment from tbordaz (@tbordaz) at 2020-05-28 16:01:56

I wonder if in order to workaround the problem to increase the width for the end of the substring. https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/changing-substring-index-search

For example nsSubStrEnd=7 for uniquemember, will only evaluate entries with '^uid=tm' and 'uid=tmo' that should reduce the amount of lookup entries. The risk is a filter with less than 7 chars may be unindexed.

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2020-05-28 16:48:42

I wonder if in order to workaround the problem to increase the width for the end of the substring. https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/changing-substring-index-search For example nsSubStrEnd=7 for uniquemember, will only evaluate entries with '^uid=tm' and 'uid=tmo' that should reduce the amount of lookup entries. The risk is a filter with less than 7 chars may be unindexed.

But won't that match the wrong entries?

uid=tmorris,ou=people,o=test uid=tmark,ou=people,o=test uid=tmbordaz,ou=people,o=test

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2020-05-29 07:52:55

Worth saying that this is misleading:

[28/May/2020:12:13:34.913302039 +0200] conn=5 (Internal) op=1(9)(1) SRCH base="dc=example,dc=com" scope=2 filter="(seeAlso=*uid=tmorris,ou=people,dc=example,dc=com)" attrs="seeAlso"
[28/May/2020:12:13:34.913427485 +0200] conn=5 (Internal) op=1(9)(1) RESULT err=0 tag=48 nentries=0 etime=0.000131536 notes=U details="Partially Unindexed Filter

The filter is internally transformed to exclude tombstones, so the tombestone exclusion is indexed, but the seeAlso is not. Which is going to end up as effectively doing a filter test on all live entries, which yes, would take up a lot of dblocks on effectively all live entries we need to test.

We probably should be indexing these related values by default, because that's the core of the problem here.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2020-05-29 07:55:00

I wonder if in order to workaround the problem to increase the width for the end of the substring. https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/changing-substring-index-search For example nsSubStrEnd=7 for uniquemember, will only evaluate entries with '^uid=tm' and 'uid=tmo' that should reduce the amount of lookup entries. The risk is a filter with less than 7 chars may be unindexed.

We should not do this IMO.

As suggested, when the entry has no descendants, we can do an EQ search instead, which would be indexed.

This could be easily checked with:

change to entry X
get the entryid

get all where parent id is entryid
if none, then leaf
    do eq search
else
    do sub search
389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2020-08-27 17:33:45

Metadata Update from @mreynolds389: