Open drichardson opened 2 years ago
Was able to get past install error after running this:
$ sudo dscl . -rm /Groups/nixbld
$ for x in $(dscl . -list /Users|grep nix); do sudo dscl . -rm /Users/$x; done
Not sure this is actionable without more context on the pre-install state (i.e., how/why the build users already existed but the group didn't).
Not sure this is actionable without more context on the pre-install state (i.e., how/why the build users already existed but the group didn't).
It was a new machine. Only thing I can think of is that I tried to run the single user install (see steps to repro). But other than that, not sure.
I'm not sure either. We're obviously out in ~weird territory...
I'll show my math on why this doesn't add up, and then ask a few increasingly paranoid questions:
~~> Setting up the build group nixbld
---- sudo execution ------------------------------------------------------------
I am executing:
$ sudo /usr/sbin/dseditgroup -o create -r Nix build group for nix-daemon -i 30000 nixbld
Create the Nix build group, nixbld
Created: Yes
You can see in the underlying code here that it would say the group "exists" if it was already there, and say it was "created" if not: https://github.com/NixOS/nix/blob/4d67ecbbb2a00b22b1b23073f5853bcb5b100b75/scripts/install-multi-user.sh#L448-L471
~~> Setting up the build user _nixbld1
Exists: Yes
Hidden: Yes
Home Directory: /var/empty
Note: Nix build user 1
Logins Disabled: Yes
Member of nixbld: Yes
PrimaryGroupID: 30000
Likewise, it'll say whether the user already exists or is created: https://github.com/NixOS/nix/blob/4d67ecbbb2a00b22b1b23073f5853bcb5b100b75/scripts/install-multi-user.sh#L484-L503
https://github.com/NixOS/nix/blob/master/scripts/install-nix-from-closure.sh handles both the option parsing and the single-user install process. If you invoked with --no-daemon
, you presumably hit this: https://github.com/NixOS/nix/blob/4d67ecbbb2a00b22b1b23073f5853bcb5b100b75/scripts/install-nix-from-closure.sh#L61-L65
It would abort your install before making any changes. You can confirm in the broader source (and, in any case, the single-user install creates no groups or users).
So:
First off: you're a legend for thinking about this so much.
To answer your questions:
- Do you still have the initial install in your scrollback?
Unfortunately no. I do have some bash history I will include though, starting with the first attempt to do a single user install.
- Is this your personal device? (More to the point: is it enrolled in an MDM or otherwise managed by an institution in some way that might weird our user/group assumptions?)
It's a work computer, but I purchased it myself from and does not have MDM or any other kind of profile provisioning on it. Also, no body else at my company experienced this issue, just me (and there are several other nix users).
- Was it already set up when you started using it, or did you go through the first-time setup yourself?
I bought it new, unwrapped it, and set it up myself.
- Either during first-time setup or after, did you happen to use the Migration Assistant? (It's been long enough since I last ran through setup that I don't recall if it uses those words; it'll say something about transferring data...)
Yes. But it didn't work, so I (attempted) to wipe by doing a new install. I did have nix installed on my previous computer.
This seems really sus, I think you found the problem. I wonder if there's anyway for me to check if I had a partial migration. I bet I didn't actually wipe it like I thought I did.
- Did you use any other kind of restore-from-backup process on it?
Nope.
- If the answer to either 4 or 5 is yes: did the previous system have Nix installed? If so:
Yes.
- is this device still available (in whatever condition it was before migration)?
No, I wiped it.
- what version of macOS does/did it have installed?
Latest. I updated it right before I sent it to another colleague, so 12.x.x. (not sure exactly).
- what's your best guess as to when Nix was first installed on it?
Sometime after Dec 13.
SUS
$ ls /Library/SystemMigration/History
Migration-FCBA4AEA-A53F-4B53-A0E5-15635D75611F
I think you found the problem @abathur. Users/groups brought over from a migration.
I actually forgot about the migration because (as I mentioned) I tried (and obviously failed) to wipe it.
And look at this section from /Library/SystemMigration/History Migration-FCBA4AEA-A53F-4B53-A0E5-15635D75611F/Request (which is a binary plist):
220 => "groupCreation"
221 => {
"$class" => <CFKeyedArchiverUID 0x6000015ddec0 [0x22075c000]>{value = 36}
"NS.objects" => [
0 => <CFKeyedArchiverUID 0x6000015df840 [0x22075c000]>{value = 222}
1 => <CFKeyedArchiverUID 0x6000015df860 [0x22075c000]>{value = 223}
2 => <CFKeyedArchiverUID 0x6000015df880 [0x22075c000]>{value = 224}
]
}
222 => "nixbld"
223 => "com.apple.sharepoint.group.2"
224 => "com.apple.sharepoint.group.1"
225 => {
"$class" => <CFKeyedArchiverUID 0x6000015de020 [0x22075c000]>{value = 27}
"NS.keys" => [
0 => <CFKeyedArchiverUID 0x6000015df8a0 [0x22075c000]>{value = 226}
]
"NS.objects" => [
0 => <CFKeyedArchiverUID 0x6000015de120 [0x22075c000]>{value = 33}
]
}
QED.
Fields Medal for @abathur
Other than detecting this and recovering, nothing to do for this issue I guess. Feel free to close and thanks for your awesome investigation!
Ha! I'm glad that seems like the culprit. We would've been deep in red yarn territory if none of these panned out.
I think it can stay open (and I don't have the power to close it, anyways). If you can find a way to phrase the migration assistant into the title it may help this thread be a better light-house for anyone else seeing the same.
As far as fixing this later goes:
If you can find a way to phrase the migration assistant into the title it may help this thread be a better light-house for anyone else seeing the same.
Done.
I'm assisting a user who appears to be hitting this issue (including the SystemMigration reference).
Even though dscl
shows that the users and groups exist (if we create them), or don't exist (if we delete them); and dsmemberutil checkmembership
shows the users to be members of the groups when they should, getgrnam()
appears not to be including any list of users as associated with the group.
I'm assisting a user who appears to be hitting this issue (including the SystemMigration reference).
Even though
dscl
shows that the users and groups exist (if we create them), or don't exist (if we delete them); anddsmemberutil checkmembership
shows the users to be members of the groups when they should,getgrnam()
appears not to be including any list of users as associated with the group.
- Do we have an actual resolution/workaround/mechanism to fix this?
- Is there any concrete/specific investigation I can perform?
I'm not sure what to tell you, but I happened to notice this in my inbox right after you sent it, so I want to note that there are some poorly-understood quirks here with respect to user/group relations in macOS. You can see an example of this in https://github.com/NixOS/nix/pull/4532#issuecomment-775274318 and my 2 immediately-following comments.
I suspect the thing that'll get you on the road again is trying to follow these uninstall instructions before reinstalling: https://nixos.org/manual/nix/stable/installation/installing-binary.html#macos
That said, if you have a little bit of timeline wiggle here it would be nice to collect some information on the user/group setup on this device. (I don't personally work with users/groups much in macOS, but I've asked in chat to see if anyone has specific ideas...)
The user who was experiencing this is no longer in the impacted state: It was fixed by looping over the build accounts, running dscl . append /Groups/nixbld GroupMembership _nixbld$i
.
I'm guessing that this added nixbld
as a supplemental group, in addition to being a primary group by virtue of the GIDs matching. Why this was necessary is a very open question.
I did get a dump of the /Users
and /Groups
plists earlier, when this was still happening, and have them on hand to query.
Glad your user is sorted. :)
Also promising that they're related, since GroupMembership was involved yet again. If there's nothing sensitive in them, can you drop them in a code block, perhaps within a <details>
tag, or even just attach a file/log containing them?
Okay, got a new dump, and comparing them, the difference is clear as day :)
Only after running the relevant dscl append
commands does the nixbld
group have a dsAttrTypeStandard:GroupMembership
key at all. Just having matching GIDs doesn't suffice; a user needs to be explicitly listed in a GroupMembership
array for the getgrnam()
call in UserLock::findFreeUser()
to return it.
btw, it's worth explicitly calling out that dsAttrTypeStandard:GroupMembers
is populated in both the before and after cases; it's only dsAttrTypeStandard:GroupMembership
that was unpopulated in the faulty state. This explains why many of the OS's tools were claiming that the group membership was already correct.
Sorry for the slow response--I had this mostly-written in a tab but then discovered some plagiarism and had my day/week/month upended...
Thanks for the update! I'm glad that we seem to have a culprit. (But broadly frustrated that there's so much lurking complexity here...)
Some thoughts on potential next steps:
One thing we should try to keep in-frame is whether this might be a byproduct of migrating versions of macOS before some version. (Maybe this is a requirement they added at some point, and added to their tooling for new users, but migration is able to smuggle group/account setups unchanged from before this requirement existed.)
@drichardson @charles-dyfis-net do you happen to know what macOS versions the old/migrated systems were running?
We might be holding the user/groups tooling wrong, or there might be bugs/omissions in the macOS migration routine and tooling. We could maybe open a feedback? My record with getting useful responses to feedbacks is not great. Don't feel obliged, but let me know the FB number if you have or happen to open one?
@drichardson @charles-dyfis-net On the off chance either of you opened a Feedback, can you give me the FB number? (Not expecting you to open one if you haven't, but I'll reference it if I get a chance to follow up w/ them.)
We could probably update the installer to either try to narrowly detect and repair dsAttrTypeStandard:GroupMembership
, or we could add these users and groups to the list of things the macOS installer can "cure" by completely removing and replacing them.
(This is probably the easiest way to fix issues like this without having to really understand them, but it would also make repeat installs significantly slower and might keep us from learning enough about the causes to just fix them before they break on users?)
- @drichardson @charles-dyfis-net do you happen to know what macOS versions the old/migrated systems were running?
No I don't remember the exact version, but can hazard some guesses.
I was migrating from an almost brand new M1 machine to another new M1 machine with almost identical specs (the new one just had more RAM). The "old" machine was almost certainly up to date (I update regularly). Based on https://en.wikipedia.org/wiki/MacOS_Monterey it looks like that would have been 12.1, 12.2, or 12.2.1 (unlikely since it was released the same day I reported this issue.
I don't remember what the "new" machine had on it, but I started using it almost as soon as it arrived, so assuming Apple gave me a recently built computer (which I imagine they did since I had to wait a while for it) it also was probably running 12.1 or 12.2.
2. @drichardson @charles-dyfis-net On the off chance either of you opened a Feedback, can you give me the FB number? (Not expecting you to open one if you haven't, but I'll reference it if I get a chance to follow up w/ them.)
I did not. I'm not sure what "a Feedback" is (but I'm guessing some nixOS thing).
@drichardson drat; I guess we won't age out of it then. Thanks for narrowing it down :)
By feedback I just mean a report in the Apple Feedback Assistant.
The most recent system it was observed on was a M1 Mac received within the last two weeks. I don't have the precise version number at hand.
Going through my old emails, my prior Apple Support engagements don't appear to have transcripts, so at least from the emailed receipts I don't have enough information to pin down which of them corresponded with this issue (I reported it to them once after it happened on then-recent M1-based personal hardware some time last year, which AFAIK nothing ever came of). I don't believe I've ever used the Feedback Assistant.
Insofar as this issue is pretty easy to identify by querying a dumped group plist, I'd imagine we could (1) patch the installer to identify and repair it (as a bare minimum, to ensure that reinstalling does fix the issue); and (2) possibly add some pre-startup logic to the nix-daemon launchd service.
Describe the bug
The installation script failed.
Reporting here per the instructions as the bottom of the output.
Steps To Reproduce
NOTE: It's highly likely (2) was an aborted Migration.
Fails with:
Expected behavior
Installation should succeed.
nix-env --version
outputN/A because nix not installed yet.
Additional context
Full log