NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.47k stars 1.44k forks source link

install-darwin: fix _nixbld uids for macOS sequoia #10919

Open abathur opened 2 weeks ago

abathur commented 2 weeks ago

Motivation

Starting in macOS 15 Sequoia, macOS daemon UIDs are encroaching on our default UIDs of 301-332. This commit relocates our range up to avoid clashing with the current UIDs of 301-304 and buy us a little time while still leaving headroom for people installing more than 32 users.

It also adopts GID 331, since @emilazy pointed out that this will keep our build group from showing up in the Users & Groups interface. (See https://github.com/NixOS/nix/pull/10919#issuecomment-2203507850)

Leaving as a draft for now since:

Context

Priorities and Process

Add :+1: to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

emilazy commented 2 weeks ago

Is there any chance we’ll be able to automate this transition on upgrade? Getting the word out to every single user that they need to run a script seems painful. And if we’re going to get everyone to run a migration script, is there any chance we can move the builder group to GID 330 at the same time so it stops showing up in System Settings?

I can try and test this in a VM someday soon.

abathur commented 2 weeks ago

Is there any chance we’ll be able to automate this transition on upgrade? Getting the word out to every single user that they need to run a script seems painful.

That's a Nix-nix question, I guess. I'm not personally aware of any mechanism for running a stateful script when people update Nix and don't see one on nix upgrade-nix --help (not that this would help people not using the new CLI or who only update their nix via nixos/hm/nix-darwin). We'd probably need to build some user migration logic into the daemon (and it would still only help you if people updated to the new Nix and ran that daemon before they updated to Sequoia).

if we’re going to get everyone to run a migration script, is there any chance we can move the builder group to GID 330 at the same time so it stops showing up in System Settings?

I've wondered why people were mentioning the GID, since we haven't needed to change it previously. (I assumed people were just conflating the 30000 GID we set with the 301 I gather the detsys installer sets?)

I'm less sure about migrating it. Inevitably we can change it, but I don't know if doing so will "break" the existing groups. (If it did, we might have to rework the script a little to just remove the users + group and re-add them all?)

caldwell commented 2 weeks ago

Is there a reason that every nix install needs the same UIDs for these users? Furthermore, is there a reason they need to be consecutive? Does the main nix builder code deal with the UIDs directly or look them up from the names?

It appears from the user reported error in #10912 that it uses the names, in which case the installer would be free to pick whatever UIDs happen to be free on a given OS install when creating them… That would be a simpler solution for the install/upgrade script and it would work everywhere. Trying to pick a empty span is only ever guaranteed to work on a fresh OS install—there's always going to be somebody out there who's got a user in that range created by hand or by some other software.

As far as upgrading goes, perhaps the script should be more nixos/puppet-like (declarative/idempotent) and create the users only if they don't exist.

It would also be super convenient if the code that got the error in #10912 could call the upgrade script and fix it right there, but I don't know if it has the correct permissions at that point in time. It could at least mention your upgrade script or prompt the user to re-install.

abathur commented 2 weeks ago

Is there a reason that every nix install needs the same UIDs for these users? Furthermore, is there a reason they need to be consecutive? Does the main nix builder code deal with the UIDs directly or look them up from the names?

To the best of my understanding (as someone not very familiar with the codebase for Nix itself), the answer to all of these is: not really.

It appears from the user reported error in #10912 that it uses the names, in which case the installer would be free to pick whatever UIDs happen to be free on a given OS install when creating them… That would be a simpler solution for the install/upgrade script and it would work everywhere. Trying to pick a empty span is only ever guaranteed to work on a fresh OS install—there's always going to be somebody out there who's got a user in that range created by hand or by some other software.

Indeed. But reworking this means modifying how we install users on all platforms, having to wrestle with edge cases that the current installer's process is too ~dumb to have to worry about (like what to do if the UIDs we want are taken by old nixbld users that may not match the nixbuildN user we were planning to put at that UID, what to do if we run out of valid role UIDs on macOS before placing the requested number of users, etc.), and having to go test it on a broad spectrum of systems to confirm the change.

While this refactor might save us from having to relocate to a new range a few years in the future, it would not fix the underlying problem here--this macOS update's installer currently clobbers our existing users to take the UIDs for its own daemons. This could of course happen to any UID we use on any update to any existing install on macOS--either they'll have to stop doing this without relocating our users, or Nix itself needs to get smart enough to detect the situation and recover from it or suggest remediation steps to the user.

As far as upgrading goes, perhaps the script should be more nixos/puppet-like (declarative/idempotent) and create the users only if they don't exist.

Not sure if you mean this in the context of the migration script, or the installer. If the latter, I broadly agree--but full idempotence is tricky to reach and maintain, and a lot of idempotence-focused work can lead to minimal benefit if/when one or two things block full idempotence (i.e., we do the work and testing, but the installer will still break or bail somewhere and we still have to tell frustrated users to go manually uninstall and reinstall).

I had thoughts and laid down some patterns for getting us here a few years back, but at this point I imagine this is more likely to come from working on the NixOS org's fork of the detsys installer (directly or by contributing to the upstream detsys installer itself). That said, I'll note that macOS eminent-domaining our UIDs and clobbering the users in the process is also causing trouble for their installer. (IIRC is breaks their ability to do an uninstall, for example.)

It would also be super convenient if the code that got the error in #10912 could call the upgrade script and fix it right there, but I don't know if it has the correct permissions at that point in time. It could at least mention your upgrade script or prompt the user to re-install.

I agree, but I think that's a nix-nix question outside of the scope of this PR (and I imagine it would be better if that took the form of a more general user fixup routine instead of having to figure out how to suggest macos version-specific cleanup to only the right users).

I'll also note that--unless Apple changes the updater to be a bit more polite--the cake is mostly-baked here. Even if someone opened a PR to support this in Nix today and there was a release cut by the end of the week, some fraction of Nix's macOS users will not be using that Nix release when they take the Sequoia update (whether that's a beta this summer or the official release this fall).

emilazy commented 1 day ago

I’m sorry that I didn’t yet get around to testing the migration; I will try to do so soon.

@abathur How do you feel about trying to land the UID (and preferably GID) changes for new installs only – which has to be done regardless – and we can worry about migration when it becomes clearer if Sequoia is going to implement any kind of migration itself?

abathur commented 1 day ago

No worries :)

I'm not opposed to changing the GID (whether here or in a separate PR to ensure both are easy to revert from GH without unrelated regressions), but I'm conservative about fiddling with these and do need some convincing:

If there is an issue pointing me that way is fine, but if not could you open one and document what you're seeing there (ideally w/ screenshots)?

emilazy commented 1 day ago
emily@yuyuko ~> sudo dseditgroup -o create -r 'test group for emily 1' -i 3000 emilytest1
emily@yuyuko ~> sudo dseditgroup -o create -r 'test group for emily 2' -i 360 emilytest2
Screenshot of “test group for emily 1” showing up in System Settings

(Sonoma 14.5)

I fiddled with a bunch of combinations along these lines and a bunch of different IDs but the results were pretty clear; as I mentioned in https://github.com/NixOS/nix/issues/10892#issuecomment-2169253635 the threshold for groups seems to be 500 for whatever reason, but of course picking one that matches the UID we’ll always take up seems the most conservative choice and Apple do use GID 395–400 and 441 as of Sonoma.

I don’t know if I can prove the negative re: the group ID potentially causing problems, but I can’t personally foresee any problems that we wouldn’t already get with UIDs. I’m open to trying to find the time to test stuff in a VM if you have some proposals for ways to test things, though. From reading the old discussion from when we moved the UIDs, I get the impression we were just too preoccupied with those more pressing issues to think about whether there might be any side‐effects of having a GID outside the system range too.

abathur commented 1 day ago

I don’t know if I can prove the negative re: the group ID potentially causing problems, but I can’t personally foresee any problems that we wouldn’t already get with UIDs. I’m open to trying to find the time to test stuff in a VM if you have some proposals for ways to test things, though. From reading the old discussion from when we moved the UIDs, I get the impression we were just too preoccupied with those more pressing issues to think about whether there might be any side‐effects of having a GID outside the system range too.

To clarify, I don't mean that I won't PR the change without meeting that standard of proof--I think your comment here reasonably demonstrates the issue and shows that a lower UID addresses it. I just meant that one reason I'm treading cautiously is that I can't just ~transfer confidence from looking at the other installers frontrunning us on this for days/weeks/months with issues/PRs to document the problem+fix and a lack of subsequent reports I could take as supporting evidence.

emilazy commented 1 day ago

Oh yeah I totally understand the conservatism here, don’t worry. I just figure if we’re on Apple’s wild ride for the time being anyway we might as well improve the UX, especially if we do end up having to make everyone do a manual migration. I guess if I find the time to test the Sequoia migration I can make a group in the system range before the upgrade and see what happens to it?

I think the DetSys installer has some kind of A/B testing roll‐out stuff. I don’t know how quickly we could get data on potential GID problems with that though.

emilazy commented 14 hours ago

Reposting this inline since it’s hidden in the commit comments currently:

Maybe we should make the group name _nixbld too for consistency with the users and the other system groups in the range while we’re at it? Probably doesn’t matter that much, but I seem to recall we renamed the users to get them hidden in some way so it might not hurt to follow suit with the group.