Open ZoomRmc opened 4 years ago
Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:
Still important to me.
And for me as well.
Ditto.
I marked this as stale due to inactivity. → More info
Good bot.
Related issue on systemd: https://github.com/systemd/systemd/issues/8234 btw.
I marked this as stale due to inactivity. → More info
Is this still relevant?
I don't have any problems anymore on my end, and nixos-generate-config
correctly generates what should be mounted, even multi-device, and it seems like it mounts properly.
AFAIK bcachefs does it's own fsck on mount, but I could be wrong.
@Madouura Just to be clear, you are booting of a multi-device bcachefs root device?
/boot itself is an ef00 partition with a fat filesystem, but root is multi-device bcachefs, encrypted too.
{
fileSystems."/" =
{ device = "/dev/sda1:/dev/nvme0n1p2:/dev/nvme1n1p1";
fsType = "bcachefs";
};
fileSystems."/boot" =
{ device = "/dev/disk/by-uuid/5BCE-E0D5";
fsType = "vfat";
};
}
Very nice, this seems to indicate my side of the problem was fixed :)
Is there any reason to keep the issue open, or am I missing something then?
I think we can close. Of course, in the best case @ZoomRmc would confirm that the issue is indeed solved for them. I can't verify, I ditched my system with BCacheFS long ago.
Should be fine to close, as it seems to be fixed. If it needs to be re-opened then it can be.
The problem still persists, although, I'm not sure the reasons are still the same - currently, NixOS can't generate a fsck service name. I need to be using persistent block device names, as I have multiple controllers and short labels change randomly on boot.
[ 20.178828] systemd-fstab-generator[492]: Failed to create fsck service name: File name too long
[ 21.176951] systemd[488]: /nix/store/kxqqbyxf4w0bg4n2ip1qq3kr5bw4hdq0-systemd-249.7/lib/systemd/system-generators/systemd-fstab-generator failed with exit status 1.
The config is:
"/data" = {
device = "/dev/disk/by-id/wwn-0x50014ee2671c3970:/dev/disk/by-id/wwn-0x50014ee657c4ef16-part1:/dev/disk/by-id/wwn-0x50014ee657dcf49d-part3:/dev/disk/by-id/wwn-0x5000cca369ce34a9:/dev/disk/by-id/wwn-0x5000039ffef41ed1";
fsType = "bcachefs";
options = [ "verbose" "nofail" "noatime" "x-systemd.device-timeout=25s" ];
};
Do I need to file a separate issue?
Nah, I must have misunderstood the issue. Let's go ahead and reopen.
Oddly I am actually still running into the original issue even on unstable. If I add the following:
fileSystems."/data" = {
device = "/dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1";
fsType = "bcachefs";
};
Then systemd times out with Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1
and then fails to boot.
@Slabity Are you having any other issues than just timeout on load? from your comment and log message it doesn't seem like your unit is having trouble with the device string but rather timing out due to having to walk the full journal, Have you tried adding x-systemd.device-timeout=
? I've got mine set to 2min
(for worse case after improper shutdown) for a 12TB array.
/dev/md127p1
I'm not sure bcachefs supports another RAID(?) device in this string to begin with. What is this device?
On Sat, May 14, 2022, 11:42 PM Tyler Slabinski @.***> wrote:
Oddly I am actually still running into the original issue even on unstable. If I add the following:
fileSystems."/data" = { device = "/dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1"; fsType = "bcachefs"; };
Then systemd times out with Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 and then fails to boot.
— Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/72970#issuecomment-1126858416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWNC7ISTPRK42GP36FN7XW3VKB6DZANCNFSM4JKEJPRQ . You are receiving this because you modified the open/close state.Message ID: @.***>
/dev/md127p1 I'm not sure bcachefs supports another RAID(?) device in this string to begin with. What is this device?
It's a RAID5 MD device with 3 8TB HDDs that I use as the background target. I used MD because bcachefs doesn't seem to have RAID5 support yet. I have no issues when setting it up and I can mount it manually with no issue. It's only at boot time when systemd times out.
@Slabity Are you having any other issues than just timeout on load? from your comment and log message it doesn't seem like your unit is having trouble with the device string but rather timing out due to having to walk the full journal, Have you tried adding x-systemd.device-timeout=? I've got mine set to 2min (for worse case after improper shutdown) for a 12TB array.
Is walking the journal part of fsck? If so, I don't think it's getting to that yet. When the timeout occurs it ends up dropping me into an emergency shell. From there I check journalctl -xb
and see the following:
May 14 23:35:40 nixos systemd[1]: dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device: Job dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device timed out.
May 14 23:35:40 nixos systemd[1]: Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1.
Subject: Unit dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device has failed
Defined-By: systemd
Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Unit dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.device has failed
The result is RESULT
May 14 23:35:40 nixos systemd[1]: Dependency failed for File System Check on /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1.
Subject: Unit systemd-fsck@dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.service has failed
Defined-By: systemd
Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Unit systemd-fsck@dev-nvme1n1p1:-dev-nvme2n1p1:-dev-md127p1.service has failed
The result is RESULT
Which I believe means it's not even getting to the fsck
checks because it's not finding the device.
I can also confirm earlier in the logs that md127 is seen and exists, so I'm confident I have my MD modules setup correctly and all 3 devices exist at the time. Once I'm in the emergency shell I can run the following and it mounts instantly:
mount -t bcachefs /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 /data
I will try playing with different options, but I'm a little confused how this is working for anyone at all. According to the systemd issue linked, it doesn't support multiple devices yet and some workaround with a custom mount service is required. Unfortunately I have no idea what a service file for that would even look like.
I'm guessing it's failing in this block, likely due to it not being able to check the RAID device. https://github.com/NixOS/nixpkgs/blob/34e4df55664c24df350f59adba8c7a042dece61e/nixos/modules/system/boot/stage-1-init.sh#L85-L113 I don't know why it can't just time out and then mount anyway though, that's confusing.
I'm guessing it's failing in this block, likely due to it not being able to check the RAID device.
Hmm... Well this is happening in stage 2. In fact if I add the "nofail" option then my system boots up fine and I can log in before it even finishes timing out.
This might be because I'm mounting it as /data
instead of a required root partition?
I'm pretty sure the mount time scales with the amount of data bcachefs needs to read before it can mount it completely. As there's some sort of in memory structure that needs to be built currently (this is planned to be addressed)
(https://www.patreon.com/posts/9293694)
On larger filesystems, bcachefs's mount times still are too slow - this is really only a stopgap measure until I implement persistent allocation information and a few other things. Fsck performance appears to be quite good compared to other filesystems, though. (Could use benchmarks if anyone wants to run them).
If you want to know what bcachefs
is doing and if it's stuck somewhere, you should set options = ["verbose"];
https://github.com/koverstreet/bcachefs/issues/318#issuecomment-932264367
@firestack - Thanks, but I'm pretty sure it is not the raid device or bcachefs taking too long. This is specifically systemd being unable to find the device.
I can run sudo systemctl start data.mount
after logging in and it still will timeout with the same Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1
error. On the other hand I can run sudo mount -t bcachefs /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1 /data
and it will mount in 2 seconds.
It looks like we do something special in stage-1 by manually splitting the devices to get them to work. I'm not sure we do that in stage-2 or when creating the mount service or wherever this is happening.
It looks like we do something special in stage-1 by manually splitting the devices to get them to work
AFAIK, this is not the case.
The only splitting done is to check each device in waitDevice
separately, there is nothing altered beyond that one function.
AFAIK, this is not the case. The only splitting done is to check each device in
waitDevice
separately, there is nothing altered beyond that one function.
It does look like the root filesystem is mounted by the script instead of through systemd though, which could very well be why it works for people that use it for their root filesystem.
Unfortunately I have no way to test this myself other than trying to install a new copy of NixOS on it, but it's the only thing that I can see that's actually different from what everyone else here is doing. I'm going to try and see if there's a way to create a systemd.mounts
entry that will search for each device individually instead of all at once. At least until systemd fixes the bug on their side.
I made a PR (#175548) that fixes the mount.bcachefs.sh
script which should allow you to mount using the UUID instead of the devices themselves. I am able to now mount a non-root filesystem by setting fsType = "bcachefs.sh"
like so:
"/data" = {
device = "64adc9ee-d89d-4a2c-bb4d-6e22a7ab5219";
fsType = "bcachefs.sh";
options = [ "verbose" "nofail" "noatime" "x-systemd.device-timeout=10s" ];
};
Oddly I am actually still running into the original issue even on unstable. If I add the following:
fileSystems."/data" = { device = "/dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1"; fsType = "bcachefs"; };
Then systemd times out with
Timed out waiting for device /dev/nvme1n1p1:/dev/nvme2n1p1:/dev/md127p1
and then fails to boot.
same here. it sounds like we're waiting on https://github.com/systemd/systemd/issues/8234 ?
i see the bcachefs.sh
workaround has been abandoned.
i see the
bcachefs.sh
workaround has been abandoned.
Sort of. I did not mean to close the PR altogether, but I forgot I had it still opened after 9 months when I deleted the repo.
The issue with that PR was that I couldn't guarantee it would work with it mounted as the root in stage-1 and nobody tested it since then to confirm what changes would be required to make it work. I can confirm the workaround does still work when mounting with systemd though if you want to add an override in your system to get it working.
I'd like to see if it's possible to get the mounting tool that's written in rust to work.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/how-do-i-mount-multiple-bcachefs-devices-on-boot/37463/6
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/how-can-i-install-specifically-util-linux-from-unstable/38637/2
Describe the bug fsck systemd unit fails on multidevice bcachefs filesystems with
Timed out waiting for device /dev/sda1:/dev/sdb1:/dev/sdc1
, thus failing the mount unit being a dependency. The issue, as I understand, is thatbcachefs mount
expects the used partitions to be separated by colon, while fsck needs them separated by whitespace.To Reproduce Steps to reproduce the behavior:
bcachefs format /dev/sd[ab]1
"/mnt" = { device = "/dev/sda1:/dev/sdb1"; fsType = "bcachefs"; };
Expected behavior Units not failing.
Metadata
nix-env (Nix) 2.3pre6895_84de821
"nixos-20.03pre200231.7827d3f4497"