NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
16.49k stars 12.99k forks source link

fsck and mount systemd units fail on multidevice bcachefs filesystem #72970

Open ZoomRmc opened 4 years ago

ZoomRmc commented 4 years ago

Describe the bug fsck systemd unit fails on multidevice bcachefs filesystems with Timed out waiting for device /dev/sda1:/dev/sdb1:/dev/sdc1, thus failing the mount unit being a dependency. The issue, as I understand, is that bcachefs mount expects the used partitions to be separated by colon, while fsck needs them separated by whitespace.

To Reproduce Steps to reproduce the behavior:

  1. Create multidevice bcachefs filesystem bcachefs format /dev/sd[ab]1
  2. Add it to your hardware-configuration.nix and rebuild. "/mnt" = { device = "/dev/sda1:/dev/sdb1"; fsType = "bcachefs"; };

Expected behavior Units not failing.

Metadata

stale[bot] commented 4 years ago

Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on irc.freenode.net.
ZoomRmc commented 4 years ago

Still important to me.

wucke13 commented 3 years ago

And for me as well.

bqv commented 3 years ago

Ditto.

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

ZoomRmc commented 2 years ago

Good bot.

wucke13 commented 2 years ago

Related issue on systemd: https://github.com/systemd/systemd/issues/8234 btw.

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

Madouura commented 2 years ago

Is this still relevant? I don't have any problems anymore on my end, and nixos-generate-config correctly generates what should be mounted, even multi-device, and it seems like it mounts properly. AFAIK bcachefs does it's own fsck on mount, but I could be wrong.

wucke13 commented 2 years ago

@Madouura Just to be clear, you are booting of a multi-device bcachefs root device?

Madouura commented 2 years ago

/boot itself is an ef00 partition with a fat filesystem, but root is multi-device bcachefs, encrypted too.

Madouura commented 2 years ago
{
  fileSystems."/" =
    { device = "/dev/sda1:/dev/nvme0n1p2:/dev/nvme1n1p1";
      fsType = "bcachefs";
    };

  fileSystems."/boot" =
    { device = "/dev/disk/by-uuid/5BCE-E0D5";
      fsType = "vfat";
    };
}
wucke13 commented 2 years ago

Very nice, this seems to indicate my side of the problem was fixed :)

Madouura commented 2 years ago

Is there any reason to keep the issue open, or am I missing something then?

wucke13 commented 2 years ago

I think we can close. Of course, in the best case @ZoomRmc would confirm that the issue is indeed solved for them. I can't verify, I ditched my system with BCacheFS long ago.

Madouura commented 2 years ago

Should be fine to close, as it seems to be fixed. If it needs to be re-opened then it can be.

ZoomRmc commented 2 years ago

The problem still persists, although, I'm not sure the reasons are still the same - currently, NixOS can't generate a fsck service name. I need to be using persistent block device names, as I have multiple controllers and short labels change randomly on boot.

[   20.178828] systemd-fstab-generator[492]: Failed to create fsck service name: File name too long
[   21.176951] systemd[488]: /nix/store/kxqqbyxf4w0bg4n2ip1qq3kr5bw4hdq0-systemd-249.7/lib/systemd/system-generators/systemd-fstab-generator failed with exit status 1.

The config is:

    "/data" = {
      device = "/dev/disk/by-id/wwn-0x50014ee2671c3970:/dev/disk/by-id/wwn-0x50014ee657c4ef16-part1:/dev/disk/by-id/wwn-0x50014ee657dcf49d-part3:/dev/disk/by-id/wwn-0x5000cca369ce34a9:/dev/disk/by-id/wwn-0x5000039ffef41ed1";
      fsType = "bcachefs";
      options = [ "verbose" "nofail" "noatime" "x-systemd.device-timeout=25s" ];
    };

Do I need to file a separate issue?

Madouura commented 2 years ago

Nah, I must have misunderstood the issue. Let's go ahead and reopen.

Slabity commented 2 years ago

Oddly I am actually still running into the original issue even on unstable. If I add the following:

    fileSystems."/data" = {
      device = "/dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1";
      fsType = "bcachefs";
    };

Then systemd times out with Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 and then fails to boot.

firestack commented 2 years ago

@Slabity Are you having any other issues than just timeout on load? from your comment and log message it doesn't seem like your unit is having trouble with the device string but rather timing out due to having to walk the full journal, Have you tried adding x-systemd.device-timeout=? I've got mine set to 2min (for worse case after improper shutdown) for a 12TB array.

Madouura commented 2 years ago

/dev/md127p1

I'm not sure bcachefs supports another RAID(?) device in this string to begin with. What is this device?

On Sat, May 14, 2022, 11:42 PM Tyler Slabinski @.***> wrote:

Oddly I am actually still running into the original issue even on unstable. If I add the following:

fileSystems."/data" = {
  device = "/dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1";
  fsType = "bcachefs";
};

Then systemd times out with Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 and then fails to boot.

— Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/72970#issuecomment-1126858416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWNC7ISTPRK42GP36FN7XW3VKB6DZANCNFSM4JKEJPRQ . You are receiving this because you modified the open/close state.Message ID: @.***>

Slabity commented 2 years ago

/dev/md127p1 I'm not sure bcachefs supports another RAID(?) device in this string to begin with. What is this device?

It's a RAID5 MD device with 3 8TB HDDs that I use as the background target. I used MD because bcachefs doesn't seem to have RAID5 support yet. I have no issues when setting it up and I can mount it manually with no issue. It's only at boot time when systemd times out.

@Slabity Are you having any other issues than just timeout on load? from your comment and log message it doesn't seem like your unit is having trouble with the device string but rather timing out due to having to walk the full journal, Have you tried adding x-systemd.device-timeout=? I've got mine set to 2min (for worse case after improper shutdown) for a 12TB array.

Is walking the journal part of fsck? If so, I don't think it's getting to that yet. When the timeout occurs it ends up dropping me into an emergency shell. From there I check journalctl -xb and see the following:

May 14 23:35:40 nixos systemd[1]: dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device: Job dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device timed out.
May 14 23:35:40 nixos systemd[1]: Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1.
Subject: Unit dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device has failed
Defined-By: systemd
Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Unit dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device has failed

The result is RESULT
May 14 23:35:40 nixos systemd[1]: Dependency failed for File System Check on /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1.
Subject: Unit systemd-fsck@dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.service has failed
Defined-By: systemd
Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Unit systemd-fsck@dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.service has failed

The result is RESULT

Which I believe means it's not even getting to the fsck checks because it's not finding the device.

I can also confirm earlier in the logs that md127 is seen and exists, so I'm confident I have my MD modules setup correctly and all 3 devices exist at the time. Once I'm in the emergency shell I can run the following and it mounts instantly:

mount -t bcachefs /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 /data

I will try playing with different options, but I'm a little confused how this is working for anyone at all. According to the systemd issue linked, it doesn't support multiple devices yet and some workaround with a custom mount service is required. Unfortunately I have no idea what a service file for that would even look like.

Madouura commented 2 years ago

I'm guessing it's failing in this block, likely due to it not being able to check the RAID device. https://github.com/NixOS/nixpkgs/blob/34e4df55664c24df350f59adba8c7a042dece61e/nixos/modules/system/boot/stage-1-init.sh#L85-L113 I don't know why it can't just time out and then mount anyway though, that's confusing.

Slabity commented 2 years ago

I'm guessing it's failing in this block, likely due to it not being able to check the RAID device.

Hmm... Well this is happening in stage 2. In fact if I add the "nofail" option then my system boots up fine and I can log in before it even finishes timing out.

This might be because I'm mounting it as /data instead of a required root partition?

firestack commented 2 years ago

I'm pretty sure the mount time scales with the amount of data bcachefs needs to read before it can mount it completely. As there's some sort of in memory structure that needs to be built currently (this is planned to be addressed)

(https://www.patreon.com/posts/9293694)

On larger filesystems, bcachefs's mount times still are too slow - this is really only a stopgap measure until I implement persistent allocation information and a few other things. Fsck performance appears to be quite good compared to other filesystems, though. (Could use benchmarks if anyone wants to run them).

If you want to know what bcachefs is doing and if it's stuck somewhere, you should set options = ["verbose"]; https://github.com/koverstreet/bcachefs/issues/318#issuecomment-932264367

Slabity commented 2 years ago

@firestack - Thanks, but I'm pretty sure it is not the raid device or bcachefs taking too long. This is specifically systemd being unable to find the device.

I can run sudo systemctl start data.mount after logging in and it still will timeout with the same Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 error. On the other hand I can run sudo mount -t bcachefs /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 /data and it will mount in 2 seconds.

It looks like we do something special in stage-1 by manually splitting the devices to get them to work. I'm not sure we do that in stage-2 or when creating the mount service or wherever this is happening.

Madouura commented 2 years ago

It looks like we do something special in stage-1 by manually splitting the devices to get them to work

AFAIK, this is not the case. The only splitting done is to check each device in waitDevice separately, there is nothing altered beyond that one function.

Slabity commented 2 years ago

AFAIK, this is not the case. The only splitting done is to check each device in waitDevice separately, there is nothing altered beyond that one function.

It does look like the root filesystem is mounted by the script instead of through systemd though, which could very well be why it works for people that use it for their root filesystem.

Unfortunately I have no way to test this myself other than trying to install a new copy of NixOS on it, but it's the only thing that I can see that's actually different from what everyone else here is doing. I'm going to try and see if there's a way to create a systemd.mounts entry that will search for each device individually instead of all at once. At least until systemd fixes the bug on their side.

Slabity commented 2 years ago

I made a PR (#175548) that fixes the mount.bcachefs.sh script which should allow you to mount using the UUID instead of the devices themselves. I am able to now mount a non-root filesystem by setting fsType = "bcachefs.sh" like so:

"/data" = {
  device = "64adc9ee-d89d-4a2c-bb4d-6e22a7ab5219";
  fsType = "bcachefs.sh";
  options = [ "verbose" "nofail" "noatime" "x-systemd.device-timeout=10s" ];
};
ziguana commented 1 year ago

Oddly I am actually still running into the original issue even on unstable. If I add the following:

    fileSystems."/data" = {
      device = "/dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1";
      fsType = "bcachefs";
    };

Then systemd times out with Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 and then fails to boot.

same here. it sounds like we're waiting on https://github.com/systemd/systemd/issues/8234 ?

i see the bcachefs.sh workaround has been abandoned.

Slabity commented 1 year ago

i see the bcachefs.sh workaround has been abandoned.

Sort of. I did not mean to close the PR altogether, but I forgot I had it still opened after 9 months when I deleted the repo.

The issue with that PR was that I couldn't guarantee it would work with it mounted as the root in stage-1 and nobody tested it since then to confirm what changes would be required to make it work. I can confirm the workaround does still work when mounting with systemd though if you want to add an override in your system to get it working.

I'd like to see if it's possible to get the mounting tool that's written in rust to work.

nixos-discourse commented 6 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-do-i-mount-multiple-bcachefs-devices-on-boot/37463/6

nixos-discourse commented 5 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-can-i-install-specifically-util-linux-from-unstable/38637/2