DeterminateSystems / nix-installer

Install Nix and flakes with the fast and reliable Determinate Nix Installer, with over 7 million installs.
https://determinate.systems
GNU Lesser General Public License v2.1
2.21k stars 57 forks source link

[bug] Steam Deck crashed mid installation and can't be uninstalled #235

Closed DanSM-5 closed 1 year ago

DanSM-5 commented 1 year ago

Description

I tried running the nix installer and after entering Y (prompt for : Proceed? (y/N):) the steam deck rebooted which looks like some type of crash mid installation.

I wasn't fast enough too see in which part it rebooted or if it displayed an error message.

I followed this link and used the command

curl -L https://install.determinate.systems/nix | sh -s -- install steam-deck

At the moment it seems like nix installed partially as I can see nix, nix-env and nix-shell available in my session. However I cannot uninstall.

The command:

/nix/nix-installer uninstall

fails because the file /nix/nix-installer does not exist.

I tried using curl like follows

curl -L https://install.determinate.systems/nix | sh -s -- uninstall

But if fails with the following error message:

❯ curl -L https://install.determinate.systems/nix | sh -s -- uninstall
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 15950  100 15950    0     0  54908      0 --:--:-- --:--:-- --:--:-- 54908
info: downloading installer https://install.determinate.systems/nix/tag/v0.2.0/nix-installer-x86_64-linux
`nix-installer` needs to run as `root`, attempting to escalate now via `sudo`...
Error:
   0: Reading receipt
   1: No such file or directory (os error 2)

Location:
   src/cli/subcommand/uninstall.rs:126

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

If I try to install again I get the following error:

❯ curl -L https://install.determinate.systems/nix | sh -s -- install steam-deck
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 15950  100 15950    0     0  56418      0 --:--:-- --:--:-- --:--:-- 56418
info: downloading installer https://install.determinate.systems/nix/tag/v0.2.0/nix-installer-x86_64-linux
`nix-installer` needs to run as `root`, attempting to escalate now via `sudo`...
Error:
   0: Planner error
   1: Error executing action
   2: Path exists `/etc/systemd/system/nix-directory.service`

Location:
   /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/convert/mod.rs:726

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

I would like to uninstall to try again, however I'm not very familiar with nix ecosystem and I don't know how to proceed from here. Could I get some help with this?

Information

SteamOS: 3.3.3 Kernel: 5.13.0-valve21.3.1-neptune

Additional

I saw https://github.com/DeterminateSystems/nix-installer/issues/145 but I'm unsure if it is related to my issue.

Pablo1107 commented 1 year ago

Happens to me, you can try:

curl -L https://install.determinate.systems/nix | sh -s -- uninstall
DanSM-5 commented 1 year ago

@Pablo1107 You can see in the description that I've tried that exact command but I get an error

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 15950  100 15950    0     0  54908      0 --:--:-- --:--:-- --:--:-- 54908
info: downloading installer https://install.determinate.systems/nix/tag/v0.2.0/nix-installer-x86_64-linux
`nix-installer` needs to run as `root`, attempting to escalate now via `sudo`...
Error:
   0: Reading receipt
   1: No such file or directory (os error 2)

Location:
   src/cli/subcommand/uninstall.rs:126

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Adding RUST_BACKTRACE (either 1 or full) shows <empty backtrace>.

Hoverbear commented 1 year ago

Hi @DanSM-5 ! Thanks for trying it out! I'm sorry you had an issue.

The Deck shouldn't reboot mid-installation. That's very strange. I'd like to learn more about that. Can you let me know more about the state of your deck? Does it have things installed that are uncommon? Is it set up in an possibly unexpected way?

Regarding the failed install you can't uninstall: I suspect that is because the installer failed before it set up the systemd unit to mount the directory it needs.

The Steam Deck has a read-only root (which can change between two different root partitions between boots), so we create a /nix by setting steamos-readonly to false for a brief time. We link that to /home/nix.

Here are some things we can look at:

It's possible the Deck rebooted early enough that none of these things happened.

If a /home/nix/receipt.json exists, we can probably still uninstall. You'd need to bind mount /home/nix to /nix and then run the uninstall process as you described! Alternatively, I can help you determine the steps to manually uninstall.

It'd be something like:

file /home/nix/receipt.json
steamos-readonly disable
mkdir -p /nix
mount --bind /home/nix /nix
/nix/nix-installer uninstall

To manually uninstall you'd need to remove all the nixbld** users, the nixbld group, /etc/systemd/system/nix-directory.service, /etc/systemd/system/nix.mount, /etc/systemd/system/ensure-symlinked-units-resolve.service, and /home/nix.

Hoverbear commented 1 year ago

In the /etc/systemd/system/ensure-symlinked-units-resolve.service file we run /usr/bin/systemctl restart --no-block sockets.target timers.target multi-user.target which I suspect could cause a "Reboot like experience" if your display manager restarted for some reason.

Any chance you might be able to confirm indeed the device did hard power cycle, or it might have been a black screen while the display manager crashed and presented you with what appeared to be a fresh boot?

Hoverbear commented 1 year ago

I can confirm that I can reproduce this now. Thanks!

Hoverbear commented 1 year ago

So I can reproduce this with the main branch as well as the latest release. Testing with #237 on my deck seems to resolve it, but frankly I'm not sure why.

I did manage to confirm that this is not a device restart -- an install over ssh (as I usually test) still progresses as the display manager restarts and completes without issue... The problem is if the display manager restarts and closes the terminal running the installer.

DanSM-5 commented 1 year ago

@Hoverbear Here is more information:

The Deck shouldn't reboot mid-installation. That's very strange. I'd like to learn more about that. Can you let me know more about the state of your deck? Does it have things installed that are uncommon? Is it set up in an possibly unexpected way?

Currently I only have 2 modifications in the steam deck

Was steamos-readonly status disabled at the time of installing? (We do it automatically, but don't do much smart detection of if it's already disabled -- we should...)

No, I didn't disable read-only status. My main motivation to try a different package manager other than pacman is to avoid disabling this.

Could you see if /home/nix exists?

Yes, /home/nix exist. It has 1 file and 2 directories.

❯ ls -a /home/nix
.  ..  .reginfo  store  var

Is there a /home/nix/receipt.json?

No, recept.json does not exist.

Is there a /nix folder?

Yes, there is a /nix folder.

❯ ls -a /nix
.  ..  .reginfo  store  var

Does systemctl status nix-daemon.service show an existing unit?

No, it fails with the following output.

❯ systemctl status nix-daemon.service
Unit nix-daemon.service could not be found.

Does systemctl status nix.mount show an existing unit?

Yes, it shows it.

❯ systemctl status nix.mount
● nix.mount - Mount `/home/nix` on `/nix`
     Loaded: loaded (/proc/self/mountinfo; static)
     Active: active (mounted) since Fri 2023-02-03 00:31:07 EST; 3 days ago
      Until: Fri 2023-02-03 00:31:07 EST; 3 days ago
      Where: /nix
       What: /dev/nvme0n1p8
      Tasks: 0 (limit: 17715)
     Memory: 0B
        CPU: 3ms
     CGroup: /system.slice/nix.mount

If you cat /etc/passwd do you see entries for nixbld** users?

Yes, there are 32 entries

❯ cat /etc/passwd | grep nix
nixbld10:x:30010:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld1:x:30001:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld6:x:30006:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld0:x:30000:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld3:x:30003:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld5:x:30005:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld8:x:30008:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld2:x:30002:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld4:x:30004:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld12:x:30012:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld15:x:30015:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld13:x:30013:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld7:x:30007:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld17:x:30017:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld20:x:30020:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld11:x:30011:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld23:x:30023:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld18:x:30018:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld16:x:30016:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld9:x:30009:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld19:x:30019:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld22:x:30022:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld14:x:30014:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld30:x:30030:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld26:x:30026:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld24:x:30024:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld21:x:30021:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld25:x:30025:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld29:x:30029:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld27:x:30027:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld28:x:30028:30000:"Nix build user":/var/empty:/sbin/nologin
nixbld31:x:30031:30000:"Nix build user":/var/empty:/sbin/nologin

If a /home/nix/receipt.json exists, we can probably still uninstall. You'd need to bind mount /home/nix to /nix and then run the uninstall process as you described! Alternatively, I can help you determine the steps to manually uninstall.

I think we can rule out this as the receipt.json file does not exist.

To manually uninstall you'd need to remove all the nixbld** users, the nixbld group, /etc/systemd/system/nix-directory.service, /etc/systemd/system/nix.mount, /etc/systemd/system/ensure-symlinked-units-resolve.service, and /home/nix.

Ok, to be honest it's been a while since I've messed up with users and groups, so I'd like to confirm the following.

Any chance you might be able to confirm indeed the device did hard power cycle, or it might have been a black screen while the display manager crashed and presented you with what appeared to be a fresh boot?

In this case I can't be sure because I was looking at the output of the article while it was running. I can confirm I saw a black screen followed by the steam deck boot animation and then it started in game mode. For the information that you mentioned, it seems likely that it was a display manager crash. Running uptime shows up 16 days. I'd be confident in saying it was only the display manager but let me know if any other information can be useful.

So I can reproduce this with the main branch as well as the latest release. Testing with https://github.com/DeterminateSystems/nix-installer/pull/237 on my deck seems to resolve it, but frankly I'm not sure why.

I did manage to confirm that this is not a device restart -- an install over ssh (as I usually test) still progresses as the display manager restarts and completes without issue... The problem is if the display manager restarts and closes the terminal running the installer. Yes, that's likely the issue.

With all that said, just help me confirm the steps for manually uninstall (see my questions above) and I will proceed.

If this is going to be fixed with #237 , would you like me to wait until it is merged and try again? or do you recommend using ssh for now?

Hoverbear commented 1 year ago

It's a real shame the receipt.json wasn't created. :(

Ok, to be honest it's been a while since I've messed up with users and groups, so I'd like to confirm the following.

No problem! You want userdel --remove nixbld1, also groupdel nixbld. For the files, yes, you can just rm -r them. If you'd like I can sit down and make and test a little bash script for you, but this should be sufficient:

userdel --remove nixbld0
userdel --remove nixbld1
userdel --remove nixbld2
userdel --remove nixbld3
userdel --remove nixbld4
userdel --remove nixbld5
userdel --remove nixbld6
userdel --remove nixbld7
userdel --remove nixbld8
userdel --remove nixbld9
userdel --remove nixbld10
...
groupdel nixbld
rm -rf /home/nix
rm /etc/systemd/system/ensure-symlinked-units-resolve.service
rm /etc/systemd/system/nix-directory.service
rm /etc/systemd/system/nix.mount
rm /etc/nix.conf

I should have a test build for you of #237 today or tomorrow, we had some CI issues last day so it didn't end up on our s3 bucket sadly. I'll post here.

Running uptime shows up 16 days. I'd be confident in saying it was only the display manager but let me know if any other information can be useful.

Awesome, so I think that was the issue. So I was able to reproduce that on my Deck (only on real hardware though), and my patch I posted does solve it for me, hope it solves it for you!

DanSM-5 commented 1 year ago

I've proceed with the manual uninstalling but there are 2 things:

Hoverbear commented 1 year ago

Yup sorry! /etc/nix/nix.conf is right.

You can absolutely remove /nix. If it gives you a permission or read error you may need to do steamos-readonly disable then remove it, then steamos-readonly enable.

DanSM-5 commented 1 year ago

I've completed the manual uninstall process. Let me know when the new build is ready.

jacobranson commented 1 year ago

I can also test this new build. Was facing the same issue a few weeks ago.

Hoverbear commented 1 year ago

We landed the fix on main so you can test it with:

curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix/branch/main | sh -s -- install steam-deck

We'll likely cut a new release soon with this fix to proper.

DanSM-5 commented 1 year ago

I want to mention 2 things:

Hoverbear commented 1 year ago

THanks @DanSM-5 for testing! :) Glad to know it's fixed. I'll cut a new release as soon as I can.

Hoverbear commented 1 year ago

Should be fixed mainline now!