LINBIT / drbd-utils

DRBD userspace utilities (for 9.x, 8.4, 8.3)
GNU General Public License v2.0
78 stars 46 forks source link

snapshot-resync-target-lvm.sh not compatible with 9.x #37

Open jsalatiel opened 8 months ago

jsalatiel commented 8 months ago

The snapshot-resync-target-lvm.sh script is not compatible with drbd9x. First the snippet

                OUT_OF_SYNC=$(sed -ne "/^ *$DRBD_MINOR:/ "'{
                                n;
                                s/^.* oos:\([0-9]*\).*$/\1/;
                                s/^$/0/; # default if not found
                                p;
                                q; }' < /proc/drbd) # unit KiB

will always return zero (default if not found)

Second, if you have more than one volume configured for the same resource it will also fail:

Dec 21 21:44:38 pcs03 snapshot-resync-target-lvm.sh[22355]: Cannot determine lower level device of resource exports/0 1, sorry.
Dec 21 21:44:47 pcs03 unsnapshot-resync-target-lvm.sh[22373]: invoked for exports/0 1 (drbd0 1)
Dec 21 21:44:47 pcs03 unsnapshot-resync-target-lvm.sh[22373]: 0 1 is not a valid number
Dec 21 21:44:47 pcs03 unsnapshot-resync-target-lvm.sh[22373]: 0 1 is not a valid number
Dec 21 21:44:47 pcs03 unsnapshot-resync-target-lvm.sh[22373]: Cannot determine lower level device of resource exports/0 1, sorry.
lge commented 8 months ago

Yes, this thing has several issues. For "thin" snapshots, specifying any size is even wrong.

Yes, we should ship something that works better than what we have. Even though we say the shipped scripts are "EXAMPLE" scripts only, we should not burden everyone out there with coming up with their own "solution" to a common problem.

For the multi volume case: it cannot even be fixed in userland alone, unfortunately, we need to change how and when the "before resync target" handler is invoked from the module. As is, the module will call the "before resync target" handler FOR EACH VOLUME in the resource in turn, whenever the specific volume starts to sync, in no particular order, and it will also call the "after" handler (the "unsnapshot") for each volume in turn, in no particular order, which means special care would need to be taken in that script to only delete the snapshots once all volumes are back in sync...

So: we need to change how and when these handlers are invoked from the module.

No I don't have any ETA on when we might have a generic improvement. This is waiting to be fixed for years already. Yes I get that this may be frustrating.

jsalatiel commented 1 month ago

Any news on this?