NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.94k stars 1.53k forks source link

Rebooting always unmount Nix volume in macOS Monterey 12.5 #6839

Open fwten opened 2 years ago

fwten commented 2 years ago

Describe the bug

The Nix Store volume is not mounted automatically after rebooting in macOS Monterey 12.5 (Intel).

Steps To Reproduce

  1. Restart macOS Monterey 12.5.
  2. Disk Utility shows that the Nix Store APFS volume is not mounted.
  3. /nix path is not available.

Expected behavior

The Nix Store APFS volume should be automatically mounted.

nix-env --version output

nix-env (Nix) 2.10.3

Additional context

Things were working fine under Monterey 12.4, and this issue popped up after upgrading to Monterey 12.5.

I tried uninstalling and re-installing Nix, but that did not seem to fix the issue.

To get nix working again, I would have to mount the Nix Store APFS volume in Disk Utility manually and run sudo launchctl kickstart -k system/org.nixos.activate-system.

/etc/fstab and /etc/synthetic.conf seemed fine:

$ cat /etc/fstab 
#
# Warning - this file should only be modified with vifs(8)
#
# Failure to do so is unsupported and may be destructive.
#
UUID=D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2 /nix apfs rw,noauto,nobrowse,suid,owners
$ cat /etc/synthetic.conf 
nix
run private/var/run
$ diskutil info disk3s7
   Device Identifier:         disk3s7
   Device Node:               /dev/disk3s7
   Whole:                     No
   Part of Whole:             disk3

   Volume Name:               Nix Store
   Mounted:                   Yes
   Mount Point:               /nix

   Partition Type:            41504653-0000-11AA-AA11-00306543ECAC
   File System Personality:   APFS
   Type (Bundle):             apfs
   Name (User Visible):       APFS
   Owners:                    Enabled

   OS Can Be Installed:       Yes
   Booter Disk:               disk3s2
   Recovery Disk:             disk3s3
   Media Type:                Generic
   Protocol:                  USB
   SMART Status:              Not Supported
   Volume UUID:               D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2
   Disk / Partition UUID:     D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2

   Disk Size:                 2.0 TB (2000189177856 Bytes) (exactly 3906619488 512-Byte-Units)
   Device Block Size:         4096 Bytes

   Container Total Space:     2.0 TB (2000189177856 Bytes) (exactly 3906619488 512-Byte-Units)
   Container Free Space:      1.9 TB (1851595505664 Bytes) (exactly 3616397472 512-Byte-Units)
   Allocation Block Size:     4096 Bytes

   Media OS Use Only:         No
   Media Read-Only:           No
   Volume Read-Only:          No

   Device Location:           External
   Removable Media:           Fixed

   Solid State:               Yes

   This disk is an APFS Volume.  APFS Information:
   APFS Container:            disk3
   APFS Physical Store:       disk2s2
   Fusion Drive:              No
   Encrypted:                 No
   FileVault:                 No
   Sealed:                    No
   Locked:                    No

In case this is relevant, I'm also running this Monterey from an external drive with the following structure:

image
abathur commented 2 years ago

Thanks for the good report. Gets a lot of basics out of the way. Three (groups of) questions to start:

  1. When you say you uninstalled and reinstalled, did you DIY, or follow the instructions from the manual?

    (I suspect you followed them, since you didn't report any reinstall issues, but just in case...)

  2. Does /Library/LaunchDaemons/org.nixos.darwin-store.plist exist? Does it refer to the same volume UUID in your earlier output? Does it mention /usr/bin/security? If you reboot and run sudo launchctl kickstart -k system/org.nixos.darwin-store.plist, does it mount the volume?

  3. Can you elaborate on the statement below (lay out exactly what happened and what the symptoms were)?

    Things were working fine under Monterey 12.4, and this issue popped up after upgrading to Monterey 12.5.

    There's an ongoing known issue (#3616) that has been causing nix not to appear on your PATH after a macOS update (because the shell hook isn't being run). A little more information may help clarify whether this is a sign that updates are breaking things in a novel way (and may help others who'll presumably be running into the same issue find the thread easier).

fwten commented 2 years ago

Hi @abathur, thank you for looking into this! I will try to clear up the questions you raised here, please do let me know if you need more information:

  1. I followed the instructions from the manual and there were no issues at all here :)

  2. The /Library/LaunchDaemons/org.nixos.darwin-store.plist exists and refers to the same UUID indeed. No mention of /usr/bin/security though. I tried running sudo launchctl kickstart -k system/org.nixos.darwin-store.plist, but this did not mount the volume. Interestingly, running the commands inside it manually in a terminal worked: /usr/sbin/diskutil mount -mountPoint /nix D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2. Here's the content of the file just in case:

    $ cat /Library/LaunchDaemons/org.nixos.darwin-store.plist
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
    <key>RunAtLoad</key>
    <true/>
    <key>Label</key>
    <string>org.nixos.darwin-store</string>
    <key>ProgramArguments</key>
    <array>
    <string>/usr/sbin/diskutil</string>
    <string>mount</string>
    <string>-mountPoint</string>
    <string>/nix</string>
    <string>D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2</string>
    </array>
    </dict>
    </plist>
  3. Alright, I'll try to elaborate on this a little bit more as requested:

    Can you elaborate on the statement below (lay out exactly what happened and what the symptoms were)?

What happened and the symptoms: Right after the upgrade to 12.5 finished (after several automated reboots as well), I logged into my account and saw that my terminal shell and prompt were different. I thought I might have messed up the shell config file so I tried to look into it, but then noticed some of my cli tools were missing as well.

Troubleshooting and investigations: That's how I realized my /nix was gone because the shell and the missing tools were all installed using nix. I already had some (very) brief experiences uninstalling and re-installing Nix before this and the experience was very straightforward and pleasant, so I figured I could do the same here instead of potentially messing up the system even further. The re-installation went well and Nix works again (albeit a fresh install), but the same issue re-surfaced after I restarted the system again. Since it clearly worked after a re-installation (and before a reboot), I studied the installation script and tried to narrow down potential solutions, and that's how I ended up with the information described above. Just to be clear, apart from not being mounted automatically, the Nix volume itself is seemingly fine. I just had to mount it and launch org.nixos.darwin-store.plist manually, all the packages and setup survived as far as I could tell.

Other remarks: By the way, since you mentioned /usr/bin/security, I recalled having some issues as I was studying the installation script, in particular with these two commands:

$ sudo /usr/sbin/diskutil apfs unlockVolume disk3s7 -verify -stdinpassphrase -user D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2

<stuck here running/loading/waiting after inputting user password ...>

and

$ sudo security find-generic-password -s D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2 -w

security: SecKeychainSearchCopyNext: The specified item could not be found in the keychain.

I'm not sure how relevant this is but just thought I should mention it. Since I managed to get my setup running again just by mounting /nix manually and running sudo launchctl kickstart -k system/org.nixos.activate-system, I decided to stop investigating further.

I also came across these similar issues (#3616 etc) during my investigations, but I thought it wasn't applicable here because my whole Nix volume was missing, hence I didn't expect something that sources from /nix would work at all. Nevertheless, I still gave it a try but it didn't solve anything as far as I could tell.

abathur commented 2 years ago

The /usr/bin/security shouldn't be present in your case (it's just added when you have FileVault enabled on the volume). Just making sure the mounter sounds appropriate for your drive.

Unfortunately, the exact circumstances aren't something I think I've heard of. Two thoughts:

  1. You could open Console.app (this will be easier if you can close everything else on the system) after you reboot, try running the kickstart command again, and see if you can find anything in the console that might indicate why it's failing. (You can also do this with the log command if you happen to be familiar with its predicates or prefer searching around in the output with something other than Console.app's interface...)
  2. This is reminding me a little bit of #4640, though the context there was different (just cloud instances and local VMs as far as I know). It might be worth picking through some of the early steps information-gathering steps in there to see if anything leaps out.
pejaab commented 1 year ago

I'm not sure whether this helps, but since I ran into the same issues, I'll share my observations and how I fixed it temporarily.

Basically the output of

sudo launchtl print system/org.nixos.darwin-store

shows error code 1. Error code 1 means operation not permitted. This error only showed after reboot, manually triggering the start of the daemon never showed this error and would mount /nix fine. I figured, this error was connected to the the fact that the FileVault encrypted Volume needed to be unlocked and somewhere it was lacking the permission. I cannot tell where though. After disabling FileVault and changing system/org.nixos.darwin-store to just mount the /nix Volume instead of needing to unlock, the unmounting issue is gone, and /nix is mounted fine after every reboot.

I don't necessarily recommend disabling FileVault, but for a temporary solution on a Mac that never leaves home, I can live with it for the time being.

abathur commented 1 year ago

@pejaab You might be able to disambiguate by looking for a credential with the same UUID in keychain.

Here's how it's formatted/named/described when it's added:

https://github.com/NixOS/nix/blob/a88ae62bc0e404b7f87876bfd0a74afbac4d517d/scripts/create-darwin-volume.sh#L708-L710

If this credential goes missing for some reason, or if there is one but it doesn't correspond to your volume's UUID, the darwin-store daemon won't be able to unlock it.

Technically macOS itself can unlock this on mount--but it does it too late to prevent subtle race condition failures if your system needs to run executables or restore apps/window contents that are on the Nix Store volume.

The installer sets the volume up not to use the macOS built-in automounting to make sure problems like this have to get promptly and directly addressed instead of just causing people flaky hard-to-troubleshoot boot problems that might result in data loss and make them miserable for months or years.

pejaab commented 1 year ago

@abathur thanks for the explanation. From the discussion above I was more or less able to extrapolate how this should work. The corresponding credential was persisted correctly in an entry in the Keychain App. The behaviour persisted even after full removal of nix and new installation (I ended up with two entries in Keychain with the corresponding UUID of the newly added volume and the old one).

When manually starting the system/org.nixos.darwin-store deamon, the volume would mount successfully, not so on startup. As mentioned, for the time being it's fine for me in this case to not have this volume encrypted. I thought this may give you some further evidence in figuring out the root cause here or help the next person stumbling into this issue...

abathur commented 1 year ago

I'm a bit stumped on why it would be failing on boot and not when you manually launch the daemon (and why your case is differing with OPs on this point).

I agree that it does smell like there's some sort of permission difference.

Does your device happen to be an org device that may have an MDM profile? (I'm not sure why an org profile might appear to be restricting root more than your user, so I don't really think this would explain it--but we do have a small number of known problems associated with profile-enforced permissions/restrictions.)

pejaab commented 1 year ago

No it's not an org device and doesn't have an MDM profile.

If there is anything you feel like worth trying, I'm happy to support in case that gets anyone further. But as said you don't need to spend time on this for my sake.

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/upgrading-to-macos-sonoma/33580/5

Endle commented 1 year ago

I met a similar problem with you - after I moved my /nix to an external SSD, following p3 and p4

Same as @pejaab pointed in https://github.com/NixOS/nix/issues/6839#issuecomment-1450826401 I can manually execute sudo launchctl load -w /Library/LaunchDaemons/org.nixos.darwin-store.plist or /usr/sbin/diskutil mount -mountPoint /nix A5671.., but darwin-store.plist itself fails when booting

I guess this is related to race condition for external storage: When darwin-store is invoked, the external USB drive is not ready, causing the failure.

emileindik commented 3 months ago

I am facing the same issue after doing a macOS upgrade to Sonoma. /etc/bashrc was overwritten, so I re-added this block to the file

# Nix
if [ -e '/nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh' ]; then
  source '/nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh'
fi
# End Nix

Now my remaining issue is the nix volume not mounting on startup (I'm not using external hard drives). To workaround, I first manually mount the nix volume in disk utils, then run

$ sudo launchctl load /Library/LaunchDaemons/org.nixos.darwin-store.plist

I am using FileVault and would like to continue using it. The volume UUID agrees with the UUID in org.nixos.darwin-store.plist.

Attaching both .plist files org.nixos.darwin-store.txt org.nixos.nix-daemon.txt

By the way, I've seen some threads suggesting to run sudo launchctl load /Library/LaunchDaemons/org.nixos.nix-daemon.plist and others sudo launchctl load /Library/LaunchDaemons/org.nixos.darwin-store.plist. How should I think about the difference between these?

abathur commented 3 months ago

@emileindik I'm skeptical that this is caused (or solely caused) by the update because we don't have the volume of corroborating reports we'd expect if macOS updates were systematically overwriting /etc/bashrc or messing up the volume-mounting service (in contrast with updates overwriting /etc/zshrc which is extensively corroborated by reports).

I queried in the Nix on macOS room on Matrix and no one has corroborated seeing this on any Sonoma update so far (but I'll update if someone does).

Attaching both .plist files...

I don't see anything wrong with these at a glance.

How should I think about the difference between these?

They are separate services. darwin-store is only responsible for mounting the store at boot, and nix-daemon is only responsible for running the nix daemon. The latter only makes sense once the volume is mounted.

I'm not sure what's going on, so I'll just think out loud a bit...

emileindik commented 3 months ago

Thanks for the quick response @abathur. I updated from Sonoma 14.6 to a developer beta version (14.6.x I think?). That borked my /etc/bashrc. I've since reverted back to 14.6 and this time /etc/bashrc stayed intact. Telling, perhaps.

I will report back if I uninstall/reinstall nix and still see issue.

Many thanks!

emileindik commented 2 months ago

Update: finally got it working after reading last post in this thread https://discourse.nixos.org/t/macos-upgrade-breakage/50691/7 Had to turn both of these on.

Screenshot 2024-09-08 at 8 59 37 PM