LINBIT / drbd-utils

DRBD userspace utilities (for 9.x, 8.4, 8.3)
GNU General Public License v2.0
78 stars 46 forks source link

Improvements to snapshot-resync-target-lvm.sh #8

Closed jcharaoui closed 3 years ago

jcharaoui commented 3 years ago

I have prepared a handful of small changes for this script, which bring fixes for working with resources composed of multiple volumes, to deal with the deprecation of /proc/drbd and allows the script to skip the snapshot creation altogether if it's not required.

lge commented 3 years ago

Thanks for the contribution, it clearly pushes us to do something about it.

I'd not call the script recursively, but have a shell function dealing with one LV, then have a for loop calling it for all LVs.

Caution must be taken, the kernel module will call the script for each "peer device" aka volume in turn, but at the time the resync of the respective volume starts. This may be suboptimal, but that's how it is. Possibly needs to be wrapped with a per-resource "flock".

Since multi-volume resources give a "consistency-group promise", the first invocation needs to snapshot all volumes. It also needs to be idempotent (finding existing snapshots should not make it fail). For the same "consistency group" reason, I disagree with skipping the snapshot if nothing is out-of-sync for any particular volume, because that violates the consistency group promise.

Which all means, I cannot take it as is immediately. But we will look into it.

jcharaoui commented 3 years ago

Thanks for looking into this!

My understanding, and according to the tests I've done, is that the script is actually called once per resource, not once per volume. If the latter were true, then no modifications would be needed to the current version.

At least, that's what happens when the secondary node is put into standby mode via Pacemaker, but that might be a narrow case and perhaps the script is called differently in other scenarios?

jcharaoui commented 3 years ago

I tested this some more and indeed, in the case where more than one volume require synchronisation, then the script is run multiple times concurrently with the same environment.

lge commented 3 years ago

again thank you for your input.

I see a few problems, and some of them are only fixable in the module.

Currently (9.0.28), the kernel does: connection handshake for each volume: volume data generation uuid handshake, resync decision, bitmap exchange call before-resync- handler, if resync

If the first volume does not need a resync (was not changed), but some other volume does, the first volume may already have seen updates when the first snapshot handler runs. That violates the idea of the "consistency groups".

It also means that, even if all volumes had been changed and need a resync, the first handler is run only after the first bitmap exchange, all other bitmap exchanges have not taken place yet. It is very much possible (and even expected for a graceful disconncet/down before) for the "out-of-sync" for later volumes to still show "0" (the bitmap information for those has not been exchanged yet).

Which means that the "before resync target snapshot" for multi-volume is even more broken currently that I expected, at least if you want the snapshots accross all volumes to be "atomic" wrt the "consistency group" they are supposed to represent. To fix this, we will need changes in the module to address this, I don't see how we can work around that in the handler.

We will try to address this in one of the next module releases, and then update the example handler accordingly.

jcharaoui commented 3 years ago

Alright, thanks for looking into it!