NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.28k stars 14.26k forks source link

ZFS snapshot dir not accessible in non-root `neededForBoot` mounts #257505

Open amarshall opened 1 year ago

amarshall commented 1 year ago

Describe the bug

ZFS provides a <mountpoint>/.zfs/snapshot directory that automounts snapshots for easy access. This does not work for non-root mounts that are neededForBoot = true.

Steps to reproduce

{
  fileSystems = {
    "/" = { type = "zfs"; device = "rootpool/root"; };
    # Should `zfs set mountpoint=legacy rootpool/foo`
    "/foo" = { type = "zfs"; device = "rootpool/foo"; neededForBoot = true; };
  };
}
touch /foo/bar
zfs snapshot create rootpool/foo@test
ls /foo/.zfs/snapshot/test

Expected behavior

Expected to find the snapshot files (e.g. bar) in /foo/.zfs/snapshot/test, but is empty.

Additional context

Notify maintainers

Unsure who…

Metadata

Using recent nixos-unstable (e35dcc04a3853da485a396bdd332217d0ac9054f).

 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.54, NixOS, 23.11 (Tapir), 23.11.20230922.e35dcc0`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.17.0`
matthiasdotsh commented 8 months ago

Same here.

However, I have to mount the snapshots slightly differently.

The following works for me:

mkdir -p /tmp/zfsmount
sudo mount -t zfs rpool/encrypted/safe@zfs-auto-snap_hourly-2024-03-26-06h00 /tmp/zfsmount

Your suggestion

sudo mount -t zfs rpool/encrypted/safe@zfs-auto-snap_hourly-2024-03-26-06h00 -o remount

Gives me:

mount: rpool/encrypted/safe@zfs-auto-snap_hourly-2024-03-26-06h00: mount point does not exist.
Shawn8901 commented 7 months ago

~actually tried to double check on my system, that is using zfsutil for mounting and it seems that i can not directly reproduce with that, so either something else has been fixed or using zfsutil (so non legacy mountpoints) does not trigger the issue~

   "/persist" = {
      device = "rpool/safe/persist";
      fsType = "zfs";
      options = [
        "zfsutil"
        "X-mount.mkdir"
      ];
      neededForBoot = true;
    };
─ ls /persist/.zfs/snapshot
zrepl_20240409_202739_000  zrepl_20240410_201732_000  zrepl_20240411_195737_000
Enzime commented 2 months ago

@matthiasdotsh Try with the dataset instead of the snapshot

sudo mount -t zfs rpool/encrypted/safe -o remount

I've managed to reproduce this issue with both the legacy stage1 as well as the systemd stage1

The easiest workaround for this issue is to run mount -a -t zfs -o remount

In my personal setup I just add it as an extra command to the zfs-mount.service:

systemd.services.zfs-mount = {
  serviceConfig = {
    ExecStart = [ "${lib.getExe' pkgs.util-linux "mount"} -a -t zfs -o remount" ];
  };
};

This works with both legacy (mountpoint=legacy) and native (mountpoint=/...) ZFS datasets that have neededForBoot = true;

cc @ElvishJerricco

Enzime commented 2 months ago

I did some further testing and my previous setting led to my system being broken in some weird ways, the easiest workaround I have found is to set the mountpoint on all my datasets:

$ zfs set -u mountpoint=/ rpool/root

The -u flag means it'll update the mountpoint without unmounting/mounting anything which allows you to do it on an online system

Then instead I use this snippet:

systemd.services.zfs-mount = {
  serviceConfig = {
    ExecStart = [ "${config.boot.zfs.package}/sbin/zfs mount -a -o remount" ];
  };
};

All my ZFS datasets are marked as neededForBoot, so I'm not sure if this will cause issues if you also have non-legacy ZFS datasets that aren't needed for boot

amarshall commented 1 month ago

For what it’s worth, this seems to not be a problem with boot.initrd.systemd.enable = true (I haven’t A/B tested it yet, but I recently changed and no longer have that problem—or a lot of other issues tbh.)

Shawn8901 commented 1 month ago

For what it’s worth, this seems to not be a problem with boot.initrd.systemd.enable = true (I haven’t A/B tested it yet, but I recently changed and no longer have that problem—or a lot of other issues tbh.)

~I am also using systems on all of my hosts, so that could be the reason that I wasn't able to reproduce it.~

Enzime commented 1 month ago

When I tested it I believe I was able to reproduce it with systemd stage1 and with the legacy stage1.

@Shawn8901 Did you check if the contents of the /persist/.zfs/snapshot/zrepl_20240409_202739_000 is correct?

Shawn8901 commented 1 month ago

When I tested it I believe I was able to reproduce it with systemd stage1 and with the legacy stage1.

@Shawn8901 Did you check if the contents of the /persist/.zfs/snapshot/zrepl_20240409_202739_000 is correct?

Okay now I got it. Can repro. Hadn't read the inital content careful enough.

amarshall commented 1 month ago

Scratch that, it is indeed not working with systemd initrd either, I made a mistake when spot checking.

iynaix commented 2 weeks ago

The remount workaround used to work for me on zfs 2.2, but sadly it no longer does on zfs 2.3rc2.

Upon bisecting, the commit that introduced this regression is: openzfs/zfs@34118eac06fba83