CRITICAL: Replication-consumer could lose a changes, but update ContextCSN from provider

erthink commented 9 years ago

UPDATE: There are lot of problems in the original implementation of replication. In general, it could NOT work correctly especially in a multi-master mode. A more time is required to rewrite and test a lot of code.

Seems the problem was inherited from original OpenLDAP. On other hand, original OpenLDAP usually lose much more changes. Therefore it is possible than we fixed a bug, but added another.

The problem not easy to reproduce, at least required:

multi-master persistent replication;
seems at least three sides;
simultaneously updates on each node;
network troubles to break/re-establish syncrepl connections several times.

Finally we could see that set of ContextCSN are the same on all replication nodes, but datasets are significantly differ.

verter2015 commented 9 years ago

https://twitter.com/ReOpenLDAP/status/635837132951695360

erthink commented 9 years ago

Issue could be reproduced by looping test050-syncrepl-multimaster on all branches.

Using ldapmodify to add/modify/delete entries from server 1...
  iteration 1
  iteration 2
  iteration 3
  iteration 4
  iteration 5
  iteration 6
  iteration 7
...
Using ldapsearch to read all the entries from server 1...
Using ldapsearch to read all the entries from server 2...
Using ldapsearch to read all the entries from server 3...
Using ldapsearch to read all the entries from server 4...
Comparing retrieved entries from server 1 and server 2...
...
test failed - server 1 and server 2/3/4 databases differ

erthink commented 9 years ago

Probability of reproducing is match depends of cpu load. Seems a very high load is required, up to 16 running threads per core.

erthink commented 9 years ago

Using ldapmodify to delete entries from server 2...
Waiting while syncrepl replicates a changes (between 15541 and 15540)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15541 and 15542)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15541 and 15543)... Done in 0.1 seconds
Using ldapmodify to delete entries from server 3...
Waiting while syncrepl replicates a changes (between 15542 and 15540)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15542 and 15541)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15542 and 15543)... Done in 0.1 seconds
Using ldapsearch to read all the entries from server 1...
Using ldapsearch to read all the entries from server 2...
Using ldapsearch to read all the entries from server 3...
Using ldapsearch to read all the entries from server 4...
Comparing retrieved entries from server 1 and server 2...
test failed - server 1 and server 2 databases differ

diff server1.flt server2.flt 
418a419,422
> dn: cn=To be deleted by server 3,dc=example,dc=com
> objectClass: device
> cn: To be deleted by server 3
>

erthink commented 9 years ago

I have discouraged, LDAP's replication has a few big flaws and original implementation in OpenLDAP is nearly madness.

The vector-clock (aka CSN timestamps) not used properly, some times it is just omitted from protocol or got an 'optional' status. Therefore in some cases replication engine could not make a right decision.
Usage of UUID-id and DN-id of an entries are mixed. This is increase the complexity and adds a lot of special cases, but do not making replication really robust.
The original implementation in the OpenLDAP does not take in account than any portion of data could be changed asynchronously. More over, a 'cool rebus' style of coding just make difficulties for analysis and hides a lot of problems and naive bugs.

Therefore OpenLDAP could lose a changes during replication, but in the same time propagate the ContextCSN! This could be reproduced by run the test050-syncrepl-multimaster many thousands times.

However, I hope to fix errors during the week.

P.S. Please, don't create a software anymore, especially multi-threaded, and a protocols related to replication too. This will be enough to make the World better, and save my time ;)

erthink commented 9 years ago

There four problems:

delete-non-present may kill entries which were added recently (that were added with the same DNs as removed before, but with a new UUIDs), e.g. race-with-update while translation UUID to DN.
notify-of-modify could be applied to entires which were updated since comparison of the sync-cookies.
notify-of-add could be applied after a new version of such DN was created and deleted, and old (removed before) version of DN will be "revived".
notify-of-delete cold be applied to new version of DN, e.g. notify-of-delete of old entry will remove a recent.

Solution:

First two problems could solved by adding a checking of entryCSN against sync-cookie.
Thrid probles could be fixed by rejection a nofiy-of-add for entries with SID from CSN equal to local ServerID.
However, seems the last problem requires a protocol update, e.g. notify-of-delete should be sent always with UUID or/and a sync-cookie. Otherwise there is no way to reject a such retarded notify-of-delete.

erthink commented 9 years ago

Nowadays I fixed:

a 'biglock' feature allows resolve all issues related to volatility of the uuid-2-dn mapping.
reworks delete-non-present, has been a big flaw in the logic .
checking for a retard-updates.

Seems, working...

erthink commented 9 years ago

Long running of a tests show that the problem still present.

However, now probability of reproduction is significantly lower. But when a 'jitter' is enabled it had reproduced very quickly.

As the 'biglock' synchronizes syncrepl and all DIT-modifications, I think that the problem is in syncprov...

erthink commented 9 years ago

slap_queue_csn() could be called for the same pair of o_connid and o_opid, but for different CSN:

from syncrepl_updateCookie()
and later from syncrepl_entry()

top-level blame: 3a1b5619 @hyc 2007-10-05 09:03:44 65530005 @hyc 2008-12-03 04:49:53 5bd8725a @hyc 2009-03-14 01:04:55

hyc commented 9 years ago

Not seeing it. updateCookie() queues a CSN that is immediately graduated by its modify op. Where does it leave one queued?

erthink commented 9 years ago

Yes, it is strange. But I has added an assertions - just see that csn-queue already has such conn_id-op_id pair.

erthink commented 9 years ago

Oh, it is a "perfect" code, slap_graduate_commit_csn() just not called somewhere ;)

erthink commented 9 years ago

rm -rf is needed!

hyc commented 9 years ago

The syncrepl code is indeed far from perfect. It's been awaiting a full rewrite for quite a long time. http://www.openldap.org/lists/openldap-devel/200410/msg00061.html http://www.openldap.org/lists/openldap-devel/200410/msg00040.html

The consumer should have been an overlay from the very beginning.

erthink commented 9 years ago

slap_graduate_commit_csn() is not called somewhere from backends. but how it works?...

erthink commented 9 years ago

A lot of work is behind, and seems, that soon it will be fixed...

erthink commented 9 years ago

The code of OpenLDAP's replication - is just a forest of crutches. Nothing is implemented properly and error-free.

erthink commented 8 years ago

Nowadays the 'blackhole' branch contains over 100 commits on top of master, for info:

5fa26d1 ! syncrepl: approx_csn/ahead_csn (draft).
b21f4b1 syncrepl: dont replicate syncrepl_add_glue_ancestors().
ece012f syncrepl: refine o-dont-reopcate in syncrepl_del_nonpresent().
94740ac syncprov: rework syncprov_op_response, always send cookie for DELETE, and newer when entryCSN available.
8cb5226 syncprov: syncprov_qresp - more asserts.
146f14d syncprov: refine syncprov_state_ctrl(), don't send cookie when entryCNS available.
b19da42 syncrpel: don't replicate sub-ops in syncrepl_entry().
f6b0821 syncrepl: rework queue-csn in syncrepl_entry().
2951831 syncprov: syncprov_op_response - add assertion.
4216705 syncprov: minor, syncprov_done_ctrl().
365122b slapd: cleanup bullshit around op->o_csn.
9d9ed2b syncrepl: approximate CSN for delete-non-present.
6cdbd6b ? syncrepl: cookie-push - don't queue/graduate csn.
4816797 slapd: slap_csn_compare_ts() considers 'counter-within-modify'.
8d0967f slapd: adds slap_csn_shift().
40b2396 syncprov: check cookies.
4a46f1e syncrepl: rework 'refreshDone' in syncrep_process().
04df0bf syncrepl: don't clean, but push cookie after 'refreshDeletes'.
80b3961 syncrepl: refine LDAP_RES_INTERMEDIATE - syncUUIDs.
c8640b6 syncrepl: refine LDAP_RES_INTERMEDIATE - done_intermediate.
2b4413e syncrepl: refine refreshDeletes.
a85e26e syncrepl: minor, note for refreshDeletes.
9170d47 syncrepl: minor, simplify si_presentlist free.
8d8f6e9 syncprov: minor, rename syncprov_compose_cookie.
686394c syncrepl: single syncUUIDs-free for LDAP_TAG_SYNC_ID_SET.
040a372 syncrepl: simplifies do_syncrepl failure-cases.
614efad syncrepl: minor, spaces and renaming.
a773e89 syncrepl: minor refine syncrepl_cookie_push().
61e1299 syncrepl: syncrepl_refresh_done() on LDAP_CONTROL_SYNC_DONE.
455cdb6 syncrepl: rework end-of-refresh.
13f9f53 syncrepl: refine compare_cookies().
2fa1161 syncrepl: reject empty and stalled cookies in multi-master mode.
6ba3c02 syncrepl: require exactly one CSN in a SyncStateControl.
adf7a2e slapd: orderliness CSN-queue by timestamp.
f41ef69 slapd: ldap_sync_state2str().
5b7a00c syncprov: refine syncprov_state_ctrl, compose cookie inside.
c9444b4 syncprov: refine syncprov_state_ctrl, drop send-cookie-flag.
13daa0c syncprov: lookfor entryCSN in sr_operational_attrs first, then in entry-attrs.
24ef2b4 syncprov: minor, refine refreshDeletes flag.
39bb02e syncprov: don't send empty cookie in sendresp.
6c99823 syncrepl: reset sc_LDAPsync on each msg.
1da6f71 syncrepl: LDAP_PROTOCOL_ERROR if an entry-UUID is empty or invalid.
e929e92 syncprov: op_search - slap_cookie_clean_csns().
b23a995 syncprov: db_destroy - slap_cookie_free.
e2e95d3 slapd: split slap_cookie_clean into 'all' and 'csns'.
51ead1e slapd: RIP for slap_compose_sync_cookie.
539bb65 slapd: RIP for slap_sort_csn_sids.
e24a3a2 slapd: RIP for slap_parse_csn_sids.
246f0b9 slapd: RIP for slap_csn_stub_self.
b4f259c syncprov: uses slap_cookie_compose.
30cdb13 syncprov: db_open - slap_cookie_copy.
90c20a5 syncprov: db_open - uses slap_cookie_stubself.
e245348 syncprov: db_open - uses slap_cookie_pull.
0ca9192 syncprov: find_maxcsn - slap_csn_compare_ts.
76c0632 syncprov: uses slap_csns_parse_sids.
08d6eb8 syncprov: uses slap_csn_compare_ts.
fbde68a syncprov: syncprov_op_mod - avoid 'hollow' contextCSN updates.
979d01a slapd: adds slap_cookie_compare_csnset().
783602f syncprov: syncprov_op_response - slap_cookie_merge_csn.
4b682fb syncprov: syncprov_op_response - avoid hollow checkpoints.
7cc9493 syncprov: syncprov_op_response - reorder o_dont_replicate.
34db892 syncprov: syncprov_op_response - simplify si_csn_rwlock.
40f3eb1 syncprov: syncprov_op_response - slap_cookie_merge_csnset.
f4c382a slapd: adds slap_cookie_merge_csnset().
68aec35 syncprov: syncprov_op_response - slap_csn_compare_ts.
28a17ff slapd, back-mdb: 'hollow' flag of modify-ops.
cc8b24b syncrepl: refine local vars in do_syncrep_process.
ecad46a syncrepl: refines 'struct syncinfo'.
4b179b9 slapd: checks emptiness of cookie only by 'numcsns'.
307f2e4 syncrepl: checking backwardness of incoming cookies.
2a298fe syncrepl: uses pulled-cookie to avoid locking.
e8a45ce syncrepl: 'mystify' a couple of magic err-codes.
d896ad3 syncrepl: less madness in error handling.
81b029a syncrepl: 5/5 clarify msg loop - minor.
1d9f4b5 syncrepl: 4/5 clarify msg loop - refine timeout.
8e77e36 syncrepl: 3/5 clarify msg loop - rework-if-else.
4f7d2da syncrepl: 2/5 clarify msg loop - slap_cookie_compare_csn.
346d8bb syncrepl: 1/5 clarify msg loop - modlist.
16d6312 syncrepl: cleanup locals to avoid memleaks.
84b82b5 syncrepl: RIP for pending cookies.
9280765 syncrepl: cookie-push 11/11 - optimize cookie-ops.
db1b47d syncrepl: cookie-push 10/11 - move op-related vars.
3f1a0b7 syncrepl: cookie-push 09/11 - clarify 'should never happen'.
e91d1a2 syncrepl: cookie-push 08/11 - slap_cookie_merge.
954aedf syncrepl: cookie-push 07/11 - slap_cookie_move.
cc4d5ff syncrepl: cookie-push 06/11 - slap_cookie_copy.
f34b3d5 syncrepl: removes 'octet_str' from 'struct cookie'.
d956be0 slapd: RIP for slap_init_sync_cookie_ctxcsn.
a0b60ec slapd: RIP for slap_parse_sync_cookie.
45b5317 slapd: RIP for slap_reparse_sync_cookie.
7ed16b0 slapd: RIP for slap_dup_sync_cookie.
656078b syncrepl: avoid using cookie's octet_str.
f63a0cc syncrepl: rework cooking for search_provider (fixed).
c15d31e syncrepl: uses slap_cookie_compose.
d0678a5 syncrepl: uses slap_cookie_parse.
c7dfa89 syncrepl: simplify syncrep_start.
135878e syncrepl: compare cookies - slap_csn_compare_ts.
cba1565 slapd: RIP for slap_sync_cookie_free.
055d835 slapd: RIP for slap_parse_csn_sid.
f7c7445 syncprov: avoid using cookie's octet_str.
a861753 syncprov: uses slap_cookie_parse, undo slap_cookie_clean.
bd425d9 syncprov: uses slap_csn_verify, slap_cookie_verify.
49908ed syncrepl: cookie-push 05/11 - slap_cookie_free.
aeda2b2 syncrepl: cookie-push 04/11 - slap_csn_compare_ts.
0c93880 syncrepl: cookie-push 03/11 - slap_cookie_verify.
0b65cb2 syncrepl: cookie-push 02/11 - single return.
e50fa68 syncrepl: cookie-push 01/11 - initial, rename.
f2406ba syncrepl: keeps (don't reset) local contextCSN-cookie on REFRESH-error.
284d5b8 syncrepl: uses slap_clean_cookie on REFRESH-error.

igalic commented 8 years ago

Has anyone considered writing a jepsen test to expose this bug, and so we can see that after the fix it's… well, actually fixed.

hyc commented 8 years ago

I've looked into it. the jepsen infrastructure is a bit of a pain to set up. Haven't got it working yet.

erthink commented 8 years ago

To reproduce the bug it is enough just looping tests 17,18,19,43,48,50,58,61 (e.g. with syncrepl).

For instance, I use the script https://github.com/ReOpen/ReOpenLDAP/blob/ps-stable/ps/ci-buzz.sh

$ git clone https://github.com/ReOpen/ReOpenLDAP.git
$ cd ReOpenLDAP
$ git checkout ps-stable
$ ./ps/ci-buzz.sh 10 devel master

launching 0 of devel, with nice 5...
launching 0 of master, with nice 7...
launching 1 of devel, with nice 9...
launching 1 of master, with nice 11...
launching 2 of devel, with nice 13...
launching 2 of master, with nice 15...
launching 3 of devel, with nice 17...
launching 3 of master, with nice 19...
launching 4 of devel, with nice 21...
launching 4 of master, with nice 23...
launching 5 of devel, with nice 25...
launching 5 of master, with nice 27...
launching 6 of devel, with nice 29...
launching 6 of master, with nice 31...
launching 7 of devel, with nice 33...
launching 7 of master, with nice 35...
launching 8 of devel, with nice 37...
launching 8 of master, with nice 39...
launching 9 of devel, with nice 41...
launching 9 of master, with nice 43...

Some time later:

=== 2015-12-07 13:28:40, running 0,37 hours, 20 job(s) left
devel/0: 2015-12-07 13:28:19+03:00 >>> 1--test063-delta-multimaster-mdb-KFA | ports 14367 14368 14369 14370 14371 14372 14373 14374
devel/1: 2015-12-07 13:27:41+03:00 >>> 1--test022-ppolicy-mdb-KFA | ports 7760 7761 7762 7763 7764 7765 7766 7767
devel/2: 2015-12-07 13:28:36+03:00 >>> 0--test046-dds-mdb-DQD-KFA | ports 16401 16402 16403 16404 16405 16406 16407 16408
devel/3: 2015-12-07 13:26:01 Building... | make[3]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
devel/4: delay 1592 seconds... | 
devel/5: delay 1990 seconds... | 
devel/6: delay 2388 seconds... | 
devel/7: delay 2786 seconds... | 
devel/8: delay 3184 seconds... | 
devel/9: delay 3582 seconds... | 
master/0: 2015-12-07 13:28:36+03:00 >>> 1--test058-syncrepl-asymmetric-mdb-KFA | ports 63593 63594 63595 63596 63597 63598 63599 63600
master/1: 2015-12-07 13:27:29+03:00 >>> 0--test060-mt-hot-mdb-DQD-KFA | ports 11985 11986 11987 11988 11989 11990 11991 11992
master/2: 2015-12-07 13:28:24+03:00 >>> 0--test017-syncreplication-refresh-mdb-DQD-KFA | ports 9762 9763 9764 9765 9766 9767 9768 9769
master/3: delay 1393 seconds... | 
master/4: delay 1791 seconds... | 
master/5: delay 2189 seconds... | 
master/6: delay 2587 seconds... | 
master/7: delay 2985 seconds... | 
master/8: delay 3383 seconds... | 
master/9: delay 3781 seconds... | 
===
/dev/sdb1        1,1T         544G  488G           53% /home
RAM              6,9G          21M  6,9G            1% /home/ly/tmp/ReOpenLDAP/@ci-buzz.pool/ramfs
===
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu-----
 r  b     swpd     free     buff    cache   si   so    bi    bo   in   cs us sy id wa st
 9  0        0  5876864   266992  3633560    0    0   262   513  528 2807 32 11 55  1  0
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu-----
 r  b     swpd     free    inact   active   si   so    bi    bo   in   cs us sy id wa st
12  0        0  5875328  1760980  6124304    0    0   262   513  528 2807 32 11 55  1  0
===
 13:28:40 up 38 min,  2 users,  load average: 16,20, 13,24, 8,04

Nowadays no any release/version/branch which could be pass 111 iterations :(

erthink commented 8 years ago

Run the tests showed a reduction in a probability of replication failures about 10 times. No more "42" error (replication was stalled, but ContextCSNs are differ).

In general, nowadays is about 100 times less glitches in comparison with the original openldap. But we should continue the cleaning.

erthink commented 8 years ago

The noticeable point was reached today by https://github.com/ReOpen/ReOpenLDAP/commit/0ec05f6078f0457eb114a4cd579ad0ac769b3043. Parallel testing in 4 session by ps/ci-buzz.sh (ps-stable branch) successfully completed 42 iterations without errors.

There four different builds by GCC 5.3 all with the "-Wall -Werror":

with ThreadSatinizer;
with AddressSanitier;
with -Os (optimize for code size), LTO (linktime optimization) and without memory-checker (hipagut);
with -Ofast (optimize for speed), LTO (linktime optimization) and with memory-checker (hipagut);

So, therefore I had decided to close this issue and create another new. Because replication (syncprov/syncrepl) fails only in the relative hard multi-master scenarios, but works "nearly properly" for most cases.

Nowadays the original OpenLDAP have still:

generates about of 5000 warnings from GCC and could not be build with -Werror (most of these warnings are incidental, but about 20 are due to substantial or critical errors);
generates 50-100 issues by ThreadSatinizer;
was never pass more that 10 test iterations due bugs in syncprov/syncrepl and other.

So, at this point I wanna repeat: OpenLDAP is the sample and illustration of how software should not be designed and not be implemented (especially) in the code, foremost an open source.

huxili commented 7 years ago

Hello, Which version is stable for replication ? Thanks.

erthink commented 7 years ago

@huxili, the master branch is enough stable and recommended. For details please see NEWS.md and ChangeLog.

erthink / ReOpenLDAP

CRITICAL: Replication-consumer could lose a changes, but update ContextCSN from provider #43