Open OGSmv opened 1 month ago
Hi, Interesting study. I am not surprised: IMHO that test is showing the impact of the candidates list merge in the or operation (the one that ensures that search operation returns an entry only once). As it is the worst case merging 1000 time a single candidate the algorithm cannot be very efficient. I think that complexity is something like ( n * n ) as we store candidates in an array
I do not think that we will try to improve that (anyway, we do not want to encourage developer to use complex filters) At least this clearly shows that whwn writing ldap application, you has better use a reasonable number of sub-filters in OR filter.
Hi @progier389,
Do you mean that it is normal case to use many simple requests to obtain 1000+ unique entities, than use one request ?
@progier389 See:
https://github.com/389ds/389-ds-base/blob/main/ldap/servers/slapd/back-ldbm/back-ldbm.h#L270 https://github.com/389ds/389-ds-base/blob/main/ldap/servers/slapd/back-ldbm/filterindex.c#L960 https://github.com/389ds/389-ds-base/blob/main/ldap/servers/slapd/back-ldbm/idl_set.c#L249
I think that if we have a ton of highly fragmented small idls rather than a few large ones in the OR condition, we end up with this slow down. The problem is that we have to iterate over each idl to find the next min value, and as we eliminate idls we prune them out the set. So with tons of small idls, all of which have a unique value, we're likely doing n*n as you say. This in mind, it's still better than the "older approach" as I recall (where we continually unioned each idl in progress one at a time).
The way the OR condition here was optimised was to assist SSSD which does stupid queries where it creates very few but large OR conditions instead.
After a bit of thought I think there is a pretty simple optimisation here:
That way we still get the faster union behaviour over multiple OR conditions, we minimise re-allocs, and we maintain the IDList's need for sorted results.
Issue Description Filter evaluation (or matching) non optimal. This causes slow execution time on search request with big filter. One search request with 1000 uids in filter works slower than 2 requests by 500 uids in filter.
Package Version and Platform:
Steps to Reproduce
Expected results One search slower than several. On screenshots we can see the result of reading 1000 entities in several strategies (Time spent in processing).
Screenshots
Additional context Python script to reproduce bug. Command:
python3 read_test.py --host <ip> --port <port> --user "cn=Directory Manager" --pass <password> --root <root_dn>