NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.48k stars 1.44k forks source link

macOS installer: upcoming UID clash on macOS 15 Sequoia #10892

Open abathur opened 3 weeks ago

abathur commented 3 weeks ago

There are some reports percolating about the upcoming macOS Sequoia 15 (I guess from people trying the beta out) now using at least one UID in the range we've been using:

For context, history on our previous change and ID range selection is in:

I'm on the road this week and am not sure how much time I'll have to drive a PR if people want it quickly, but it shouldn't be too hard for anyone to sort out if we identify a good range, and we could probably make a new migration script along the lines of what I added in #4532 to migrate these existing users. (That said, I'm not sure if we should rush to actually merge, in case details keep shifting ahead of actual release?)

If you're playing with macOS 15 Sequoia, we need to figure out what parts of the ID space are open:

Hopefully we can just move up to 360 or so.

fbettag commented 3 weeks ago

cat /etc/passwd on sequoia shows

_aonsensed:*:300:300:Always On Sense Daemon:/var/db/aonsensed:/usr/bin/false
_modelmanagerd:*:301:301:Model Manager:/var/db/modelmanagerd:/usr/bin/false
_reportsystemmemory:*:302:302:ReportSystemMemory:/var/empty:/usr/bin/false
_swtransparencyd:*:303:303:Software Transparency Services:/var/db/swtransparencyd:/usr/bin/false
_naturallanguaged:*:304:304:Natural Language Services:/var/db/com.apple.naturallanguaged:/usr/bin/false
_oahd:*:441:441:OAH Daemon:/var/empty:/usr/bin/false
ryanbooker commented 3 weeks ago

Perhaps count down from 400? That's what I've done locally to fix the immediate issue. FYI, everything works with an arbitrary range, e.g. 3001–3032.

abathur commented 3 weeks ago

No personal objection to counting down as a strategy, but:

roberth commented 3 weeks ago

Also affects nix-darwin

stepbrobd commented 3 weeks ago

Question 1: Would it be possible to workaround this without reinstalling Nix on macOS systems with ids option (exposed but not in docs, and changing the settings doesn't change anyting)?

Q1 RFC: In this file, nixbld = 300; is set but this is not used anywhere. Perhaps we can add an idempotent shell script to add/remove nixbld group and nixbld* users on every rebuild?

Question 2: For NixOS systems, if I'm understanding the docs correctly, nixbld* users are not needed to perform builds when auto-allocate-uids or cgroups is enabled, is there anything equivalent to these on macOS systems?

abathur commented 3 weeks ago

@michaelvanstraten noted in https://github.com/NixOS/nix/issues/6153#issuecomment-2162748398 that you can get unblocked on new installs for testing with:

To repeat @ikuz's solution again for a quick fix on macOS 15 install/reinstall with:

 NIX_FIRST_BUILD_UID="305" sh <(curl -L https://nixos.org/nix/install)
emilazy commented 2 weeks ago

Some thoughts on where to reassign the IDs:

We need to be below (or equal to?) UID 400, and Apple has now used up to UID 304. Clearly we should expect they might keep adding users to the low end of this range occasionally. However, running right up against the 400 limit doesn’t seem safe to me either; /etc/group on my Sonoma machine contains groups from 395 to 400, so it seems like Apple considers the upper end of the system range to be its for the taking as well. The natural place to go, then, would be in the middle of the range.

We default to 32 build users currently. The main reason you might want more is that it limits the number of concurrent build jobs, and the number of those you might want to run is proportional to the cores/threads on your machine. The highest number of cores on a currently-shipping Mac is the 24‐core M2 Ultra, they used to ship Xeons with 48 threads (24 cores × 2 threads/core), and there are rumours of a 32‐core M3 Ultra and even a 64‐core M3 Ultimate. We can’t fit more than 96 users at the absolute maximum as of Sequoia, and that would obviously be risky, so let’s say we want to plan to have space for around 64 to 80 build users.

I suggest we start at 331 (if we want to keep the last digit matching the build user number) or 330 (if we don’t). That gives us enough margin for Apple to add ~26 new users before we run into problems again, 38 empty spaces above the top UID for the current default of 32 users, and just enough space to squeeze in 64 users before hitting the 395 ID that Apple has already used for a group.

The other good candidate would be 321/320; that reduces the margin on the low end to ~16 new Apple users, but increases the margin on the upper end, to (if using 320) just barely allow squeezing in 80 users in the available range. Personally I feel like the release of an 80‐core Mac would make me scared for 96 to 128‐core Macs that we have no realistic way of adding enough users for with our current approach anyway, and we have precedent that Apple is happy to add users on the low end of the range, so I lean towards 331 to give us more margin on that side. But I’m ambivalent if people have a strong preference for more margin to add and think that Apple will continue adding system users at a restrained enough pace compared to core count inflation. 331 spreads our users around the middle of the available range for 32 users, 321 does the same for 64 users.

This would also be a good opportunity to change the group ID; the current default of 30000 has the unfortunate effect of making the group show up in System Settings, unlike system groups. I would suggest using a related ID of 330 or 320 depending on the choice, and perhaps renaming it to _nixbld for consistency with the user names and other system groups (unless there’s any reason not to?).

Since we have to coordinate this with two installers and nix-darwin and get a migration plan in place before the release of Sequoia, I hope we can commit to a set of IDs ASAP to enable this to go smoothly.

emilazy commented 2 weeks ago

Incidentally I note that _oahd has UID 441 which makes me wonder if Apple has secretly expanded the system user range without updating the meagre documentation of it? But I remember it being a pain to test and reproduce the issues that led us to use the system UID range in the first place, so it’d need careful verification of the bounds if we wanted to see if we could go beyond 400.

emilazy commented 2 weeks ago

Some more investigation:

It seems like groups with GID < 500 don’t show up in System Settings. This may imply that the system UID range has also expanded, but I don’t know a convenient way to check, and anyway considering that Apple is using the middle of that range with 441 it might be awkward real estate to occupy; who knows what values Apple might pick in future. (There are also groups with higher GIDs that don’t show up in System Settings (e.g. com.apple.sharepoint.group.1/“<name>’s Public Folder”) that have dsAttrTypeNative:IsHidden: 1 in dscl(1) and Directory Utility, but setting that for the nixbld group didn’t seem to help.)

The maximum in‐use UID < 400 went from 297 to 304 in one version. I don’t know what the historical growth rate is like, but I’d definitely be more comfortable with 331 than 321 given that. If the growth continued at the rate of Sequoia (which seems unlikely, but still), we’d have to think of a new idea in 4 OS releases rather than 3. In general it seems like we’re on borrowed time here and I’m not really sure what the long‐term solution is. We may have to migrate now with the expectation of migrating again later.

If we could verify that UIDs < 500 now work fine, I suppose one solution would be to start at, say, 360 now, and hope Apple don’t eat up the lower 400s range if we start having machines that want 64 users. But I don’t remember how to test that.

abathur commented 2 weeks ago

Incidentally I note that _oahd has UID 441 which makes me wonder if Apple has secretly expanded the system user range without updating the meagre documentation of it?

Does this mean you looked at the usage info for sysadminctl and it still says 200-400?

But I remember it being a pain to test and reproduce the issues that led us to use the system UID range in the first place, so it’d need careful verification of the bounds if we wanted to see if we could go beyond 400.

Yeah. The main manifestation I recall was the system giving an obtuse adrenaline-inducing error message and booting into recovery mode during system updates that require a full reboot cycle. Looking back over the thread, it looks like that got less-scary on the next point release. The other was the build users showing up in a user list.

If we wanted to try moving outside of the current range, I suspect a workable protocol would be installing into the new range on a sub-sequoia version and then running the sequoia update and seeing if it blows up. (I don't currently have a spare mac that's eligible for this update, so someone else will have to drive...)

It can't save us from needing to fix this UID issue in the short run, but one way to address the long-run problem would be to figure out if we can get the various issues with auto-allocate-uids sorted out in order to make it the default. (The detsys folks tried defaulting to this in their installer and ran into trouble that compelled them to revert it on both macOS and Linux.)

After I hit post here I'll start drafting a feedback/radar report for Apple. Once I do, I'll also email their devrel about it, and post the FB number here in case anyone wants to refer to it from their own FB report. I'm not terribly optimistic about that helping (for example, I never got a response on the reports I opened about the big sur issue in 2021), but I guess there's an outside chance they'll improve their updater to relocate any UID/GID they trample to another valid ID.

(That would leave people with Weird installs--they might fail to fully clean up when they follow the uninstall instructions for example--but I think it would at least not break every existing macOS install made since early 2022 or whenever the multi-user default was released...)


For searchability, here's one real manifestation of this on an existing install (from #https://github.com/NixOS/nix/issues/10912):

...
these 13 derivations will be built:
  /nix/store/jcrd05mlpsw8wmixwd133pv3q3xbm18w-nerdfonts-3.2.1.drv
  ...
error: the user '_nixbld1' in the group 'nixbld' does not exist
emilazy commented 2 weeks ago

Does this mean you looked at the usage info for sysadminctl and it still says 200-400?

On Sonoma, which already has that _oahd user, yes; so if that’s within the system UID range and there’s not something weird going on with that user specifically and groups in the 400 to 500 range (perfectly possible! OAH is the internal name for Rosetta 2, as I understand it, so I wouldn’t be surprised if there are strange things going on there), then they expanded it without updating what passes for the “documentation”. I haven’t tried Sequoia yet, so I can’t comment on what the command says there.

Yeah. The main manifestation I recall was the system giving an obtuse adrenaline-inducing error message and booting into recovery mode during system updates that require a full reboot cycle. Looking back over the thread, it looks like that got less-scary on the next point release. The other was the build users showing up in a user list.

I think filling up the visible user list with 32 random daemon users is scary and off‐putting to users (especially in the absence of an official upstream uninstaller), so even if the more fundamental issues might be solved now I’d be reluctant to settle on that unless we can find another way to hide them.

It can't save us from needing to fix this UID issue in the short run, but one way to address the long-run problem would be to figure out if we can get the various issues with auto-allocate-uids sorted out in order to make it the default. (The detsys folks tried defaulting to this in their installer and ran into trouble that compelled them to revert it on both macOS and Linux.)

Yes, I would love this. If we can commit in the interim to e.g. UIDs starting at 331 and a GID of 330, hopefully that would give us enough runway to make something workable out of auto-allocate-uids. I remember hearing that the problems were worse on macOS than Linux, though (e.g. https://github.com/DeterminateSystems/nix-installer/issues/521, https://github.com/DeterminateSystems/nix-installer/issues/580#issuecomment-1680951223 – I guess the lack of user namespaces really makes it tricky).

abathur commented 2 weeks ago

Ok, I've reported this in FB13917314 and emailed the devrel about it. For reference, report is roughly:

macOS 15 Sequoia beta installer clobbering existing role users with UIDs 301-304

We're getting reports (example: https://github.com/NixOS/nix/issues/10912) that the Sequoia update is clobbering existing build users for the Nix package manager, causing later errors such as:

error: the user '_nixbld1' in the group 'nixbld' does not exist

Users who have taken the update report seeing new users in this range in /etc/passwd:

_aonsensed:*:300:300:Always On Sense Daemon:/var/db/aonsensed:/usr/bin/false
_modelmanagerd:*:301:301:Model Manager:/var/db/modelmanagerd:/usr/bin/false
_reportsystemmemory:*:302:302:ReportSystemMemory:/var/empty:/usr/bin/false
_swtransparencyd:*:303:303:Software Transparency Services:/var/db/swtransparencyd:/usr/bin/false
_naturallanguaged:*:304:304:Natural Language Services:/var/db/com.apple.naturallanguaged:/usr/bin/false
_oahd:*:441:441:OAH Daemon:/var/empty:/usr/bin/false

A few years ago, the Nix installer used UIDs from 30001-30032 by default. The issue I reported in FB8997501 started causing trouble when users with these UIDs were present, so in response we took the hint from the usage note in sysadminctl ("Role accounts require name starting with _ and UID in 200-400 range") and migrated our build user UID defaults to 301-332.

The current behavior of the beta installer will break all existing multi-user Nix installs on macOS made in the last few years, confusing a lot of users in the process.

I can imagine at least two improvements that would help us out, here:

  • If these UIDs don't need to be hardcoded on your end, avoid clobbering existing role users and select a UID that doesn't clash.
  • If these UIDs need to be hardcoded, relocate any existing users to new unoccupied UIDs in the role user range.
emilazy commented 2 weeks ago

I can confirm that Sequoia’s sysadminctl says the same thing, so if there’s been any change it remains undocumented.