Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
649 stars 55 forks source link

RFC: OpenRC initscript #114

Open automorphism88 opened 5 years ago

automorphism88 commented 5 years ago

I'm using Gentoo, and since bees comes with a systemd service but I'm using OpenRC, I decided to write my own initscript. It is based somewhat on the beesd script (which I couldn't just use directly since it wouldn't work with start-stop-daemon), and designed to work with a conf.d file that's similar to the one that's installed in /etc/bees, but with some additional options added to control logging and {io,}nice level. It also avoids bashisms so that it can be run with dash as /bin/sh.

Is this something you would be interested in merging upstream? My /etc/init.d file can be found here: https://github.com/automorphism88/gentoo-overlay/blob/master/sys-fs/bees/files/bees.initd and the corresponding template /etc/conf.d file can be found here: https://github.com/automorphism88/gentoo-overlay/blob/master/sys-fs/bees/files/bees.confd. Note that while this initscript works for me, it should be considered experimental. I haven't tried to get this merged into the official Gentoo ebuild yet since I figured it would be even better to upstream the script directly into bees.

A few things I noticed about the beesd script when I was using it as a reference were:

  1. Is there some reason to chmod 700 the database file instead of chmod 600? Executable permission doesn't seem to be necessary.
  2. stat -c %s FILE is a simpler, faster, and more portable way of checking file size than the method used in the beesd script (which, I believe, relies on a GNU extension in sed to interpret \t as tab). Incidentally, stat -c %i can also be used to check whether a directory is a btrfs subvolume, since a btrfs subvolume will have an inode number of 256.
  3. Why not mount the root subvolume with noatime? Unlike btrfs-specific mount options like compression, this one can be set per-subvolume, so it will default to relatime even if the rest of the filesystem is mounted with noatime. Probably wouldn't make much difference in this case, I'm just in the habit of using noatime everywhere unless there's a specific need to do otherwise.
Zygo commented 5 years ago

+1 to all of the above. I don't really write the distro integration scripts, I just pull them. Usually when I try to edit them, I break them.

  1. executable isn't necessary, 0600 is fine.

  2. stat -c %i $PATH can't figure out the subvolume ID, but btrfs ins rootid $PATH can. stat can also report the fstype for the path. bees needs a directory with inode 256, root/subvol id 5, and fstype btrfs.

  3. Sounds like an excellent idea. bees will examine every file and touch some of them. The files that are modified are doomed to have their inodes updated anyway, but the ones that are merely read shouldn't be hit with unexpected inode updates.

automorphism88 commented 5 years ago
2. `stat -c %i $PATH` can't figure out the subvolume ID, but `btrfs ins rootid $PATH` can.  `stat` can also report the fstype for the path.  bees needs a directory with inode 256, root/subvol id 5, and fstype btrfs.

I'm referring to this line in beesd where btrfs sub show is tried to test whether $BEESHOME is a subvolume or ordinary directory.

3. Sounds like an excellent idea.  bees will examine every file and touch some of them.  The files that are modified are doomed to have their inodes updated anyway, but the ones that are merely read shouldn't be hit with unexpected inode updates.

I'm referring to the invocation of mount in beesd here which implicitly uses relatime.

Zygo commented 5 years ago

You might also want to look at https://github.com/Zygo/bees/pull/104 which aims to make the script configuration a lot more stateless.

automorphism88 commented 5 years ago

I don't really write the distro integration scripts, I just pull them. Usually when I try to edit them, I break them.

I wouldn't consider this a distro integration script any more than the systemd service file is. OpenRC is used on other distros as well as Gentoo, and I don't think anything in my script is Gentoo-specific.

The issue of UUIDs in config files is at least partially solved by OpenRC multiplexing, since you can have /etc/conf.d/bees contain global options which are shared across machines, and /etc/conf.d/bees.foo containing a specific UUID. Then if you symlink /etc/init.d/bees to /etc/init.d/bees.foo, the bees.foo service will first read the /etc/init.d/bees config file, and then override it with any specific options set in /etc/init.d/bees.foo.

I could make this into a PR, but I'm not sure how to integrate it into the build system in terms of things like install paths and enabling/disabling installation. The /etc/init.d and /etc/conf.d paths are not required, for instance Arch uses /etc/openrc/init.d and /etc/openrc/conf.d, but OpenRC does assume that init.d and conf.d folders are in the same directory.

You might also want to look at #104 which aims to make the script configuration a lot more stateless.

That seems like a change which could be done independently. My script doesn't use /etc/bees or the beesd wrapper script. I could look at also adding support for specifying filesystems by path instead of UUID, though, the way that script does, which appears to be the only advantage it has over the present script in the context of an OpenRC service.

One thing that might be useful is if it were possible to just run beesd with start-stop-daemon instead of having to duplicate that code so that bees can be run directly. I think all that might be required for that would be for the beesd script to run bees with exec so that it has the same PID instead of forking. However, I assumed beesd doesn't do that so that it can use a trap to clean up on exit. That's something that I implemented separately (also making it keep track of whether or not the subvolume was already mounted at start, so that it doesn't unmount it on stopping unless it mounted it on starting).

kakra commented 5 years ago

I'm the maintainer of the Gentoo ebuild. @Zygo if you don't mind I come up with a suggestion, maybe also fixing the other problems mentioned here.

automorphism88 commented 5 years ago

I'm the maintainer of the Gentoo ebuild. @Zygo if you don't mind I come up with a suggestion, maybe also fixing the other problems mentioned here.

I'm the one who recently opened a bunch of bugs on the gentoo bugzilla about bees - thanks for fixing them! I considered submitting this there as well, but figured it'd be better to do it upstream.

By the way, @kakra, if it's not too off-topic, I noticed your fix of the version UNKNOWN issue set the version to "v${PV}" so that the help output displays "version v0.6.1" which seems redundant (version version 0.6.1?). Why not just make it "${PV}"? See as an example the output of git --version.

kakra commented 5 years ago

That's to match upstream. Bees itself uses the Git tag which is prefixed by "v".

nagelp commented 3 years ago

I'm the maintainer of the Gentoo ebuild. @Zygo if you don't mind I come up with a suggestion, maybe also fixing the other problems mentioned here.

@kakra, any news on this? I just installed sys-fs/bees on Gentoo, but there is no init.d script. @automorphism88's links at the top of this issue don't work anymore.

MCPO-Spartan-117 commented 3 months ago

I'm guessing this init script got lost then?

kakra commented 3 months ago

I'm going to update the ebuild soon, let me move this up in my todo list then.

@MCPO-Spartan-117 Thanks for pinging...