389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
213 stars 94 forks source link

Recovered supplier need to reject direct update until it is in sync with the topology #1317

Open 389-ds-bot opened 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47986


The problem is that if a recovered supplier accepts direct updates before being in sync, replica to that supplier and from that supplier is broken.

Use case in MMR with two suppliers M1/rid1 and M2/rid2

ldif RUV is : [ rid1_t0, rid2_t1]

T20: M1 RUV is [rid1_t5, rid2_t6] M2 RUV is [rid1_t5, rid2_t6]

M1 is recovered from the ldif file

T21: M1 RUV is [rid1_t0, rid2_t1] M2 RUV is [rid1_t5, rid2_t6]

T22: ldapclient send update to M1 M1 RUV is [rid1_t22, rid2_t1] M2 RUV is [rid1_t5, rid2_t6]

T23 M2 starts a replication session to M1, It will update M2 with updates [rid2_t1..rid2_t6] M1 RUV is [rid1_t22, rid2_t6] M2 RUV is [rid1_t5, rid2_t6]

But here replication is broken both ways. M2 does not know rid1_t22 in its CL and can not update M1. After the import, M1 CL has been cleared, so M1 does not know rid1_t5 and can not update M2.

This problem exist with ldif recovery but I think it also exists with backup recovery.

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2015-03-10 23:41:51

Comments made in the ticket triage: Ludwig: should be done, but is change in behaviour, sjould be configurable. Thierry: if configurable, what would be the default behavior: reject/accept.

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2016-05-13 00:19:24

Per triage, push the target milestone to 1.3.6.

389-ds-bot commented 4 years ago

Comment from nhosoi (@nhosoi) at 2017-02-11 22:49:27

Metadata Update from @nhosoi:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2017-05-08 22:27:14

Metadata Update from @mreynolds389:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2020-05-27 16:11:11

Metadata Update from @mreynolds389:

droideck commented 1 year ago

Okay, I wrote a test case (for which we just need to add the error checks), and it looks like this: https://github.com/droideck/389-ds-base/commit/763ff3cf85b49a936e708dc651dbfefcf23acf26

And when I run ds-replcheck after the test, I have the next report. i1347_report.txt

So, the issue seems legit. But please, recheck the code and report, in case I missed something.

progier389 commented 1 year ago

I am also quite sure that the issue is legit. About your test case:

droideck commented 1 year ago

I am also quite sure that the issue is legit. About your test case:

  • IMHO the pause_all_replicas / resume_all_replicas is useless (because the backend is down during the import)
  • Probably a good idea to wait for replication s2,s1 and s1,s2 at the end of the test (since replication is broken it should fails ...)

Sounds good! I'll play around with the test a bit more, and I'll create PR later next week. I probably set it as XFail as I'm not exactly sure when we'll work on that...

tbordaz commented 1 year ago

Not sure where to comments :(

I would suggest a slight change before resuming servers It should stop s2 during import/start/killer_update on s1. To be in replication breakage we need that s2 does not replicate to s1 before killer_update occurs. Once killer_update is completed, you may start s2 and verify that S1toS2 is broken and S2toS1 is broken.

tbordaz commented 1 year ago

Somehow related to https://github.com/389ds/389-ds-base/issues/2035