Open 389-ds-bot opened 4 years ago
Comment from mreynolds (@mreynolds389) at 2018-06-19 20:56:28
If you do an offline init (ldif2db) is it faster?
For example:
Master: db2ldif -r -n userroot -a /tmp/ldif Replica: ldif2db -n userroot -i /tmp/ldif
Comment from mreynolds (@mreynolds389) at 2018-06-19 20:56:29
Metadata Update from @mreynolds389:
Comment from telackey at 2018-06-20 00:06:55
@mreynolds389
Yes, in fact, it doesn't really hit this problem all, because in this server, the parentid
is always less than the child (ie, no moddn
). The db2ldif
code uses a separate scheme for ensuring the order of the tree (cf. _export_or_index_parents()
in ldap/servers/slapd/back-ldbm/ldif2ldbm.c
) which seems better suited to the problem.
I added a counter for the number IDs scanned by idl_id_is_in_idlist()
using db2ldif
and using online initialization with my small (compared to the real system) DB of 650K entries. On db2ldif
, the answer was zero, since my tree was already ordered. In the online case, the loop in idl_id_is_in_idlist()
was started 855,526
times and scanned 27,842,335,568
IDs in the process. That small database "only" took 11 minutes to scan the IDList, not 19 hours like the real DB, but it was still long enough to cause the total update to fail when it finally tried to send the first entry:
[19/Jun/2018:17:01:04.699080440 -0500] - DEBUG - NSMMReplicationPlugin - close_connection_internal - agmt="cn=389ds-build_to_test76" (test76:389) - Disconnected from the consumer
[19/Jun/2018:17:01:05.326634281 -0500] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=389ds-build_to_test76" (test76:389): Unable to send endReplication extended operation (Timed out)
[19/Jun/2018:17:01:05.327526742 -0500] - ERR - NSMMReplicationPlugin - repl5_tot_run - Total update failed for replica "agmt="cn=389ds-build_to_test76" (test76:389)", error (-1)
For comparison sake, running db2ldif
only took 3 minutes to export all 653,112
entries.
Comment from lkrispen (@elkris) at 2018-06-20 09:35:09
We need to do something here. In the discussion in ticket 48755 I was already concerned about the sorting overhead and made two other proposals, maybe we should investigate again
Comment from mreynolds (@mreynolds389) at 2018-06-21 17:48:48
Metadata Update from @mreynolds389:
Comment from tbordaz (@tbordaz) at 2020-02-19 16:31:40
Could be a duplicate of #2974
Comment from mreynolds (@mreynolds389) at 2020-06-24 16:31:48
Metadata Update from @mreynolds389:
Comment from mreynolds (@mreynolds389) at 2020-06-24 16:32:34
Metadata Update from @mreynolds389:
Here was Ludwig's comment about this issue:
Why don't we do it like ldbm2ldif does it, which I think has very low overhead:
if not, there are two options:
[1] - what ldbm2ldif does: get the parent entry and send it, make a not that this entry was sent, continue [2] - my spontaneous idea: stack the entry, continue, later process the stack (but this may be more complicated since a parent now could be in the stack)
[1] is probably the best option
I think this may be an over simplification of the algo. Because the algo becomes:
sent_set = {}
for each entryid:
if entryid in sent_set:
continue
if parentid < entryid:
send entry
else:
parent_entry = get_entry(parentid)
send parent_entry
send entry
send_set.insert(parentid)
I agree that keeping it consistent (ldbm2ldif) is a good choice though, but we do need to consider that sent_set could grow large / complex for lookups in some situtations.
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49796
Issue Description
Online replica initialization no longer works if you have a very large, deep database.
This appears to be related to a changes made to fix this bug: https://pagure.io/389-ds-base/issue/48755
As a result, a new scheme of gathering the entries for a total update was introduced, keyed off the filter
parentid >= 1
and a new operator,SLAPI_OP_RANGE_NO_IDL_SORT
.Despite the name, it seems more like a "tree depth order" sort, ensuring that all parent IDs are ahead of their children in the ID list. With that "tree depth" ordering, as the total update sends entries to the consumer, it can be confident that the parent has been sent and exists before any child entries are sent.
The sorting is done by adding the current ID to the array only if its parent ID is already present in the array (cf.
idl_id_is_in_idlist()
), else it is added to a specialleftovers
array, which is processed at the end.The function that does the check,
idl_id_is_in_idlist()
, sequentially scans the ID list, looking for the parent ID. It is scanned for every entry.If the data is arranged in a shallow tree, this is probably OK, as the parent entry will always be present early on in the array. However, if you have a tree that is deep or leafy, or even where the entries are split (like a forked tree, with half the entries under
ou=R
and half underou=B
) thenidl_id_is_in_idlist()
may need to scan millions of IDs, and it will scan them over, and over, and over...At a certain point, replication can become impossible, because the connection that was opened when the total update was initiated is closed before the IDList is finished sorting and any entries are sent.
That is not only theoretical, but observed with a multi-GB, deep and leafy database after it had spent about 19 hours (!) gathering and sorting the IDList in
idl_new_range_fetch()
before being ready to send its first entry.Package Version and Platform
Tested on CentOS 7 with 1.3.6.14.
Steps to reproduce
Actual results
The total update operation is begun, but a very long time is spent gathering the IDList before any entries are sent to the consumer. One indication of this is an import status message on the consumer indicating 0 or 1 entries, and other bogus figures:
If the database is big enough, it may even take long enough for the connection to close.
Other indications are using 100% of a core on the supplier after initiating a total update, and of excessive time spent in
idl_id_is_in_idlist()
.Expected results
Initialization should begin sending entries almost immediately.