Rudd-O / zfs-tools

ZFS synchronization and snapshotting tools
79 stars 41 forks source link

zlock addition to zbackup #11

Closed tesujimath closed 6 years ago

tesujimath commented 9 years ago

I added a locking script called zlock, which is used by zbackup to acquire a lock on a ZFS filesystem. This prevents concurrent replication. The point is, zbackup is usually invoked from cron, say nightly, but the initial replication can take many days to complete. This new function inhibits subsequent replications by zbackup, until the initial one is complete. An additional benefit is that zlock can be called by hand, to prevent zbackup replication from occurring. for example if it is desired to migrate the replicas from one replica server to another.

Note as per the README, zlock in fact does nothing at the level of the ZFS filesystem; it merely records a lockfile in /var/lib/zfs-tools/zlock which is checked by other invocations of zlock.

Rudd-O commented 9 years ago

This certainly seems useful, but the way it works makes me a bit worried that, should the machine go down while sending data to another machine, the lock would remain dangling.

May I suggest a much simpler approach? There is a command called "flock" which your program can use, which follows the same methodology with the /var/lib/zfs-tools/zlock-, and invokes the program of your choice (in this case, zreplicate), but it actually uses a fcntl lock which gets dropped as soon as the program exits. That way zbackup cannot "forget" to unlock a file system at all. Like this:

flock /var/lib/zfs-tools/zlock-<datasetname> -E 65 zreplicate <parameters>

would then return 65 when invoked against a contended lock, failing to execute zreplicate at all. You can then simplify your code significantly, slashing over 30 lines of code, and resting assured that you're using a program which is robust and well-tested.

Alternatively, you can use the lockfile Python module, which accomplishes the same effect:

lock = lockfile.LockFile('/var/lib/zfs-tools/zlock-%s' % datasetname_without_special_chars):
try:
    lock.acquire()
except AlreadyLocked:
   ... handle already locked case ...
except LockFailed:
   ... handle locking failure for other reasons
else:
   ... zreplicate ...
   lock.release()

Would you consider adapting the code to use either of those strategies?

Rudd-O commented 9 years ago

One other thing that I think may be important is to change the (default) directory of the lock to /var/lib/zfs/locks, owned by root, given that the directory /var/lib/zfs is already created by the Debian package (and really should also be created by the RPM package but I don't know how to tell good old setup.py bdist_rpm to do that for us). This can be accomplished in file debian/postinst, where creating a /var/lib/zfs/locks after the creation of the ZFS user seems straightforward to do.

That way zbackup could take an option --take-lock which, if specified, would default to /var/lib/zfs/locks, but could also take an option --take-lock=/var/lib/whateverotherdirforlocks which the sysadmin could specify in his cronjob. Versatile!

tesujimath commented 9 years ago

I did consider using flock or similar when I first did this. However, I decided against it, for the following reasons. Having a separate zlock command with lock/unlock options enabled me to write another script (not part of zfs-tools, just a simple private thing) which I called transfer-zfs-filesystem, for migrating a filesystem from one server to another. To do this safely in the presence of zbackup cron jobs, both the source and destination filesystems for transfer need to be locked against zbackup, which means that script takes locks through an ssh connection. I didn't see how to do that reasonably using flock. The interaction between zbackup and transfer-zfs-filesystem is much nicer with this zlock command model of locking. Also, a sysadmin could manually lock filesystems against zbackup replication should that be required. (For example, because maintenance is being done on the replica destination server.)

Given that all sensible users of zbackup will be using the --email-on-failure option, there is no chance that an orphaned lock will be overlooked. The sysadmin can simply spot this being alerted in email, investigate, and use zlock unlock if required. This mitigates any problem of orphaned locks IMO.

Therefore, I prefer to leave the locking as it is.

As regards changing the lockdir to /var/lib/zfs/locks, I have no problem doing that if there is a consensus that it is preferred. I didn't know about the Debian package behaviour, and was thinking that it was best for me to stay away from /var/lib/zfs, since that really belongs to core zfs, I think, not to an add-on like zfs-tools. In fact, I suggest that it would be better for the Debian packaging of zfs-tools to switch to using /var/lib/zfs-tools instead. ;-)

What do you think?

Sorry for the delay in responding. I changed jobs since sending the pull request, and have had my mind on other things for a while.

cheers, Simon

tesujimath commented 9 years ago

I just added my transfer-zfs-filesystem script to contrib in my repo, for reference.

Rudd-O commented 9 years ago

OK. Did the lock directory change to /var/lib/zfs/locks?

tesujimath commented 9 years ago

Not yet - waiting for a statement about the consensus. Please see 4th paragraph from 3 comments ago. Happy to change if you think best, when you consider my point. Please advise.

cheers, Simon

Rudd-O commented 8 years ago

What if the default lock dir changed to /run/lock/zfs, but the default dir could be changed with a command line utility?

I would be happy to merge that change.

I'm done moving to a different nation, and I can finally resume working on open source.

tesujimath commented 8 years ago

Do you mean /var/lib/zfs/locks, which is what we discussed previously? If so, I'd be happy to make that the lockdir.

I'm very much against being able to change the lockdir with a command line option, since that is error prone for the poor old sysadmin wanting simply to use this stuff.

Let me know what you think.

tesujimath commented 6 years ago

These are good comments, thanks! But isn't the game plan to remove zbackup from your repo, then, so this PR will then become obsolete? (I don't have time to respond in detail just now, as it is late in the evening here, but will certainly consider what you are saying tomorrow.)

tesujimath commented 6 years ago

Well, I decided just to continue working in my fork of your repo. It's all in a state of flux at the moment due to major rework. I am taking on board your comments, You probably should ignore any activity in my repo for a while until I get this working nicely.

tesujimath commented 6 years ago

Obviated by #24