Open 389-ds-bot opened 4 years ago
Comment from firstyear (@Firstyear) at 2020-04-30 05:41:13
Seems reasonable to me I think :)
Comment from firstyear (@Firstyear) at 2020-04-30 05:41:15
Metadata Update from @Firstyear:
Comment from mreynolds (@mreynolds389) at 2020-04-30 17:29:00
Metadata Update from @mreynolds389:
Comment from mreynolds (@mreynolds389) at 2020-05-06 21:37:37
Metadata Update from @mreynolds389:
Comment from mreynolds (@mreynolds389) at 2020-06-24 16:26:30
Metadata Update from @mreynolds389:
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/51057
There have been reports that after online initialization errors of data generation mismatch were seen. This should not happen since in online initialization the supplier sens its RUV which contains the data generation and so they must match. BUT.
There is a small window of time where generations can differ after online init. I could not figure a scenario where the result of the init was sent during this time so that the supplier could run into it, but for a short time the difference is there . and I think it doesn't need and we could make things simpler.
Here is what happens.
The supplier doing a total init doe acuire the replica and sends its RUV, the consumer to be initialized stores this RUV in the connection extension. The supplier sends all the entries, skipping the RUV tombstone entry (since this could have been modified since the start of sending entries) When the supplier is finished it sends the "end total update" request. The consumer stops the bulk import and re.enables the backend. This does also enable replication and replica_enable_replication does want to read the tombstone ruv, since it was not sent, it is not found and a new one is created with a new local data generation. We now have an RUV with a generation different from the one before and different from the one of the supplier. Then it will call replica_set_ruv and replace this local ruv with the ruv from the connection extension - now supplier and consumer have the same RUV. There is a comment in repl_extop.c by ONREPL stating that this is a hack, but thi hack is there always.
I do not really see how this short difference can cause real problems, but it is there and it is unecessary. My suggestion is to explicitely send the suppliers tombstone ruv (as suggested in the mentioned comment) durin total init. We now have the sequence
add 1.1 send the tombstone ruv and on the consumer get rid of the jack