erthink / ReOpenLDAP

Production-ready replacement for OpenLDAP with robust multi-master replication
https://github.com/ReOpen/ReOpenLDAP/wiki
Other
253 stars 30 forks source link

CRITICAL: Replication-consumer could lose a changes, but update ContextCSN from provider #43

Closed erthink closed 8 years ago

erthink commented 9 years ago

UPDATE: There are lot of problems in the original implementation of replication. In general, it could NOT work correctly especially in a multi-master mode. A more time is required to rewrite and test a lot of code.

Seems the problem was inherited from original OpenLDAP. On other hand, original OpenLDAP usually lose much more changes. Therefore it is possible than we fixed a bug, but added another.

The problem not easy to reproduce, at least required:

Finally we could see that set of ContextCSN are the same on all replication nodes, but datasets are significantly differ.

verter2015 commented 9 years ago

https://twitter.com/ReOpenLDAP/status/635837132951695360

erthink commented 9 years ago

Issue could be reproduced by looping test050-syncrepl-multimaster on all branches.

Using ldapmodify to add/modify/delete entries from server 1...
  iteration 1
  iteration 2
  iteration 3
  iteration 4
  iteration 5
  iteration 6
  iteration 7
...
Using ldapsearch to read all the entries from server 1...
Using ldapsearch to read all the entries from server 2...
Using ldapsearch to read all the entries from server 3...
Using ldapsearch to read all the entries from server 4...
Comparing retrieved entries from server 1 and server 2...
...
test failed - server 1 and server 2/3/4 databases differ
erthink commented 9 years ago

Probability of reproducing is match depends of cpu load. Seems a very high load is required, up to 16 running threads per core.

erthink commented 9 years ago
Using ldapmodify to delete entries from server 2...
Waiting while syncrepl replicates a changes (between 15541 and 15540)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15541 and 15542)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15541 and 15543)... Done in 0.1 seconds
Using ldapmodify to delete entries from server 3...
Waiting while syncrepl replicates a changes (between 15542 and 15540)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15542 and 15541)... Done in 0.1 seconds
Waiting while syncrepl replicates a changes (between 15542 and 15543)... Done in 0.1 seconds
Using ldapsearch to read all the entries from server 1...
Using ldapsearch to read all the entries from server 2...
Using ldapsearch to read all the entries from server 3...
Using ldapsearch to read all the entries from server 4...
Comparing retrieved entries from server 1 and server 2...
test failed - server 1 and server 2 databases differ
diff server1.flt server2.flt 
418a419,422
> dn: cn=To be deleted by server 3,dc=example,dc=com
> objectClass: device
> cn: To be deleted by server 3
> 
erthink commented 9 years ago

I have discouraged, LDAP's replication has a few big flaws and original implementation in OpenLDAP is nearly madness.

  1. The vector-clock (aka CSN timestamps) not used properly, some times it is just omitted from protocol or got an 'optional' status. Therefore in some cases replication engine could not make a right decision.
  2. Usage of UUID-id and DN-id of an entries are mixed. This is increase the complexity and adds a lot of special cases, but do not making replication really robust.
  3. The original implementation in the OpenLDAP does not take in account than any portion of data could be changed asynchronously. More over, a 'cool rebus' style of coding just make difficulties for analysis and hides a lot of problems and naive bugs.

Therefore OpenLDAP could lose a changes during replication, but in the same time propagate the ContextCSN! This could be reproduced by run the test050-syncrepl-multimaster many thousands times.

However, I hope to fix errors during the week.

P.S. Please, don't create a software anymore, especially multi-threaded, and a protocols related to replication too. This will be enough to make the World better, and save my time ;)

erthink commented 9 years ago

There four problems:

  1. delete-non-present may kill entries which were added recently (that were added with the same DNs as removed before, but with a new UUIDs), e.g. race-with-update while translation UUID to DN.
  2. notify-of-modify could be applied to entires which were updated since comparison of the sync-cookies.
  3. notify-of-add could be applied after a new version of such DN was created and deleted, and old (removed before) version of DN will be "revived".
  4. notify-of-delete cold be applied to new version of DN, e.g. notify-of-delete of old entry will remove a recent.

Solution:

erthink commented 9 years ago

Nowadays I fixed:

Seems, working...

erthink commented 9 years ago

Long running of a tests show that the problem still present.

However, now probability of reproduction is significantly lower. But when a 'jitter' is enabled it had reproduced very quickly.

As the 'biglock' synchronizes syncrepl and all DIT-modifications, I think that the problem is in syncprov...

erthink commented 9 years ago

slap_queue_csn() could be called for the same pair of o_connid and o_opid, but for different CSN:

top-level blame: 3a1b5619 @hyc 2007-10-05 09:03:44 65530005 @hyc 2008-12-03 04:49:53 5bd8725a @hyc 2009-03-14 01:04:55

hyc commented 9 years ago

Not seeing it. updateCookie() queues a CSN that is immediately graduated by its modify op. Where does it leave one queued?

erthink commented 9 years ago

Yes, it is strange. But I has added an assertions - just see that csn-queue already has such conn_id-op_id pair.

erthink commented 9 years ago

Oh, it is a "perfect" code, slap_graduate_commit_csn() just not called somewhere ;)

erthink commented 9 years ago

rm -rf is needed!

hyc commented 9 years ago

The syncrepl code is indeed far from perfect. It's been awaiting a full rewrite for quite a long time. http://www.openldap.org/lists/openldap-devel/200410/msg00061.html http://www.openldap.org/lists/openldap-devel/200410/msg00040.html

The consumer should have been an overlay from the very beginning.

erthink commented 9 years ago

slap_graduate_commit_csn() is not called somewhere from backends. but how it works?...

erthink commented 9 years ago

A lot of work is behind, and seems, that soon it will be fixed...

erthink commented 9 years ago

The code of OpenLDAP's replication - is just a forest of crutches. Nothing is implemented properly and error-free.

erthink commented 8 years ago

Nowadays the 'blackhole' branch contains over 100 commits on top of master, for info:

igalic commented 8 years ago

Has anyone considered writing a jepsen test to expose this bug, and so we can see that after the fix it's… well, actually fixed.

hyc commented 8 years ago

I've looked into it. the jepsen infrastructure is a bit of a pain to set up. Haven't got it working yet.

erthink commented 8 years ago

To reproduce the bug it is enough just looping tests 17,18,19,43,48,50,58,61 (e.g. with syncrepl).

For instance, I use the script https://github.com/ReOpen/ReOpenLDAP/blob/ps-stable/ps/ci-buzz.sh

$ git clone https://github.com/ReOpen/ReOpenLDAP.git
$ cd ReOpenLDAP
$ git checkout ps-stable
$ ./ps/ci-buzz.sh 10 devel master

launching 0 of devel, with nice 5...
launching 0 of master, with nice 7...
launching 1 of devel, with nice 9...
launching 1 of master, with nice 11...
launching 2 of devel, with nice 13...
launching 2 of master, with nice 15...
launching 3 of devel, with nice 17...
launching 3 of master, with nice 19...
launching 4 of devel, with nice 21...
launching 4 of master, with nice 23...
launching 5 of devel, with nice 25...
launching 5 of master, with nice 27...
launching 6 of devel, with nice 29...
launching 6 of master, with nice 31...
launching 7 of devel, with nice 33...
launching 7 of master, with nice 35...
launching 8 of devel, with nice 37...
launching 8 of master, with nice 39...
launching 9 of devel, with nice 41...
launching 9 of master, with nice 43...

Some time later:

=== 2015-12-07 13:28:40, running 0,37 hours, 20 job(s) left
devel/0: 2015-12-07 13:28:19+03:00 >>> 1--test063-delta-multimaster-mdb-KFA | ports 14367 14368 14369 14370 14371 14372 14373 14374
devel/1: 2015-12-07 13:27:41+03:00 >>> 1--test022-ppolicy-mdb-KFA | ports 7760 7761 7762 7763 7764 7765 7766 7767
devel/2: 2015-12-07 13:28:36+03:00 >>> 0--test046-dds-mdb-DQD-KFA | ports 16401 16402 16403 16404 16405 16406 16407 16408
devel/3: 2015-12-07 13:26:01 Building... | make[3]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
devel/4: delay 1592 seconds... | 
devel/5: delay 1990 seconds... | 
devel/6: delay 2388 seconds... | 
devel/7: delay 2786 seconds... | 
devel/8: delay 3184 seconds... | 
devel/9: delay 3582 seconds... | 
master/0: 2015-12-07 13:28:36+03:00 >>> 1--test058-syncrepl-asymmetric-mdb-KFA | ports 63593 63594 63595 63596 63597 63598 63599 63600
master/1: 2015-12-07 13:27:29+03:00 >>> 0--test060-mt-hot-mdb-DQD-KFA | ports 11985 11986 11987 11988 11989 11990 11991 11992
master/2: 2015-12-07 13:28:24+03:00 >>> 0--test017-syncreplication-refresh-mdb-DQD-KFA | ports 9762 9763 9764 9765 9766 9767 9768 9769
master/3: delay 1393 seconds... | 
master/4: delay 1791 seconds... | 
master/5: delay 2189 seconds... | 
master/6: delay 2587 seconds... | 
master/7: delay 2985 seconds... | 
master/8: delay 3383 seconds... | 
master/9: delay 3781 seconds... | 
===
/dev/sdb1        1,1T         544G  488G           53% /home
RAM              6,9G          21M  6,9G            1% /home/ly/tmp/ReOpenLDAP/@ci-buzz.pool/ramfs
===
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu-----
 r  b     swpd     free     buff    cache   si   so    bi    bo   in   cs us sy id wa st
 9  0        0  5876864   266992  3633560    0    0   262   513  528 2807 32 11 55  1  0
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu-----
 r  b     swpd     free    inact   active   si   so    bi    bo   in   cs us sy id wa st
12  0        0  5875328  1760980  6124304    0    0   262   513  528 2807 32 11 55  1  0
===
 13:28:40 up 38 min,  2 users,  load average: 16,20, 13,24, 8,04

Nowadays no any release/version/branch which could be pass 111 iterations :(

erthink commented 8 years ago

Run the tests showed a reduction in a probability of replication failures about 10 times. No more "42" error (replication was stalled, but ContextCSNs are differ).

In general, nowadays is about 100 times less glitches in comparison with the original openldap. But we should continue the cleaning.

erthink commented 8 years ago

The noticeable point was reached today by https://github.com/ReOpen/ReOpenLDAP/commit/0ec05f6078f0457eb114a4cd579ad0ac769b3043. Parallel testing in 4 session by ps/ci-buzz.sh (ps-stable branch) successfully completed 42 iterations without errors.

There four different builds by GCC 5.3 all with the "-Wall -Werror":

  1. with ThreadSatinizer;
  2. with AddressSanitier;
  3. with -Os (optimize for code size), LTO (linktime optimization) and without memory-checker (hipagut);
  4. with -Ofast (optimize for speed), LTO (linktime optimization) and with memory-checker (hipagut);

So, therefore I had decided to close this issue and create another new. Because replication (syncprov/syncrepl) fails only in the relative hard multi-master scenarios, but works "nearly properly" for most cases.


Nowadays the original OpenLDAP have still:

So, at this point I wanna repeat: OpenLDAP is the sample and illustration of how software should not be designed and not be implemented (especially) in the code, foremost an open source.

huxili commented 7 years ago

Hello, Which version is stable for replication ? Thanks.

erthink commented 7 years ago

@huxili, the master branch is enough stable and recommended. For details please see NEWS.md and ChangeLog.