NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.84k stars 13.92k forks source link

Incus cannot create new instances #333293

Closed schuelermine closed 2 months ago

schuelermine commented 2 months ago

Describe the bug

~$ incus launch images:ubuntu/24.04 sample
Launching sample
Error: Failed instance creation: Failed to run: /nix/store/i9dkdkh6ndlvdqg94rxdfy7qp8bp09jq-incus-lts-6.0.1/bin/incusd forkstart sample /var/lib/incus/containers /run/incus/sample/lxc.conf: exit status 1
~ [1]$ incus info --show-log sample
Name: sample
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/08/08 22:26 CEST
Last Used: 2024/08/08 22:26 CEST

Log:

lxc sample 20240808202646.963 ERROR    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:245 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 1496505 0 1000000 1000000000 0 1000000 65536
lxc sample 20240808202646.963 ERROR    start - ../src/lxc/start.c:lxc_spawn:1795 - Failed to set up id mapping.
lxc sample 20240808202646.963 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc sample 20240808202646.964 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "sample"
lxc sample 20240808202646.964 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 1496505
lxc 20240808202646.999 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240808202646.999 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

~$ incus start sample
Error: Failed to run: /nix/store/i9dkdkh6ndlvdqg94rxdfy7qp8bp09jq-incus-lts-6.0.1/bin/incusd forkstart sample /var/lib/incus/containers /run/incus/sample/lxc.conf: exit status 1
Try `incus info --show-log sample` for more info
~ [1]$ incus info --show-log sample
Name: sample
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/08/08 22:26 CEST
Last Used: 2024/08/08 22:27 CEST

Log:

lxc sample 20240808202700.167 ERROR    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:245 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 1496647 0 1000000 1000000000 0 1000000 65536
lxc sample 20240808202700.167 ERROR    start - ../src/lxc/start.c:lxc_spawn:1795 - Failed to set up id mapping.
lxc sample 20240808202700.167 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc sample 20240808202700.167 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "sample"
lxc sample 20240808202700.167 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 1496647
lxc 20240808202700.205 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240808202700.205 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

~$ incus delete sample
~$ 

Notify maintainers

@aanderse @adamcstephens @jnsgruk @megheaiulian @mkg20001

Metadata


Add a :+1: reaction to issues you find important.

adamcstephens commented 2 months ago

Please share the relevant configuration, including subuid/subgid configurations. A basic Incus server works out of the box, so this is likely due to something you've configured.

schuelermine commented 2 months ago

I’m building from the flake at https://github.com/schuelermine/configuration on commit a7f16e1 with hostname nailbox-on-buggeryyacht. Incus-specific configuration is in https://github.com/schuelermine/configuration/blob/a7f16e1067bac44aa7e636b894f07d7ef093f939/nixosModules/configuration.nix on line 11, lines 44-57, and lines 133-173.

schuelermine commented 2 months ago

I set up this Incus install by enabling Incus in the NixOS configuration, then running lxd-to-incus, and then disabling the LXD installation and copying my pressed from LXD to Incus, in the configuration.

schuelermine commented 2 months ago

Instances that I transferred with lxd-to-incus are running fine. I can run programs in them, SSH into them, and move files to them.

adamcstephens commented 2 months ago

Did you manually configure /etc/subuid or /etc/subgid? Either way, what's in those files?

schuelermine commented 2 months ago

No, not that I remember.

/etc/subuid is

root:1000000:1000000000
anselmschueler:100000:65536

/etc/subgid is

root:1000000:1000000000
anselmschueler:100000:65536
Chaostheorie commented 2 months ago

Does the error occur with privileged containers? This error type sounds like something related to your user namespace configuration.

schuelermine commented 2 months ago

The error does not occur with privileged containers.

schuelermine commented 2 months ago

I have an idea of what might be causing this: I’m running from a disk that was installed to from a different computer. I wiped /nix and /etc and re-installed with appropriate hardware configuration, but I reset stateVersion thinking it’d be fine since I wiped /etc. Maybe that’s related?

adamcstephens commented 2 months ago

I don’t think the incus or lxd modules have any state version conditions and shouldn’t store anything in /etc.

Clearly some state is on your system or you have an incompatibility with something that has changed in Incus. I know there has been some changes to the user mapping stuff, but am not sure how to tell you to proceed troubleshooting.

You could try starting a support thread in https://discuss.linuxcontainers.org/

schuelermine commented 2 months ago

I have created a thread: https://discuss.linuxcontainers.org/t/after-migrating-from-lxd-to-incus-i-can-t-create-new-unprivileged-containers/21334?u=anselmschueler

schuelermine commented 2 months ago

A restart of the service fixed it. I don’t know what the problem was.

mkg20001 commented 2 months ago

I had a similar issue. My issue was resolved by removing virtualisation.lxd.enable = true, since lxd adds it's own subuid space.