coreos / ignition

First boot installer and configuration tool
https://coreos.github.io/ignition/
Apache License 2.0
807 stars 243 forks source link

systemd-userdbd.service and systemd-userdbd.socket services fail to start when running from RAM #1855

Closed robbycuenot closed 2 months ago

robbycuenot commented 2 months ago

Bug

I am booting fedora-coreos-39.20240104.3.0-live.x86_64.iso via PXE with the following setup:

flowchart TB
    Machine[Machine<br>MAC: 00:00:00:00:00:00]
    UDM[UDM Pro]
    Synology[Synology NAS]
    subgraph DHCP_Options["DHCP Options"]
        direction LR
        IP["DHCP 50: IP of Machine"]
        TFTP_IP["DHCP 66: IP of TFTP Server"]
        Filename["DHCP 67: Filename (shim.efi)"]
    end
    subgraph TFTP_Requests["TFTP Requests"]
        direction TB
        shim["shim.efi"]
        grubx64["grubx64.efi"]
        grubcfg["grub.cfg"]
        kernel["Kernel (vmlinuz)"]
        initrd["Initial Ramdisk<br>(initrd.img)"]
        rootfs["Root Filesystem<br>(rootfs.img)"]
        shim --> grubx64 --> grubcfg --> kernel --> initrd --> rootfs
    end
    subgraph Ignition["Ignition"]
        direction LR
        disks["Format Drive, Mount /var and swap"]
        ssh["Place SSH Public Key"]
        scripts["Download Scripts"]
    end
    Machine <--> DHCP_Options <--> UDM
    Machine <--> TFTP_Requests <--> Synology[Synology NAS]
    Machine <--> Ignition <--> Synology[Synology NAS]

This is a UEFI setup with SecureBoot enabled. No issues booting and I've been using this setup for a while, but have started receiving this error since configuring the /var mount. The goal is to make the machine reprovision from scratch every reboot, mounting /var and swap to a disk only to relieve memory pressure and inode exhaustion from the loop devices.

After logging in via SSH, I receive this notice:

[systemd]
Failed Units: 2
  systemd-userdbd.service
  systemd-userdbd.socket

Operating System Version

fedora-coreos-39.20240104.3.0-live.x86_64.iso

Ignition Version

3.4.0

Environment

Dell R930 (x86_64) Windows Server 2022 Hyper-V Gen2 VM 128GB VHDX 16GB Memory 8 Virtual Cores

Expected Behavior

The services start correctly

Actual Behavior

The services fail to start with igntion, but run without issue if started after ignition has completed.

Reproduction Steps

Boot with the following ignition file:

{
  "ignition": {
    "version": "3.4.0"
  },
  "passwd": {
    "users": [
      {
        "groups": [
          "sudo"
        ],
        "name": "systemUser",
        "sshAuthorizedKeys": [
          "ssh-ed25519 AAAA..."
        ]
      }
    ]
  },
  "storage": {
    "disks": [
      {
        "device": "/dev/sda",
        "partitions": [
          {
            "label": "var",
            "number": 1,
            "sizeMiB": 65536,
            "startMiB": 0,
            "typeGuid": "4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709",
            "wipePartitionEntry": true
          },
          {
            "label": "swap",
            "number": 2,
            "sizeMiB": 0,
            "startMiB": 0,
            "typeGuid": "0657FD6D-A4AB-43C4-84E5-0933C84B4F4F"
          }
        ],
        "wipeTable": true
      }
    ],
    "files": [
      {
        "path": "/var/home/systemUser/examplescript.sh",
        "contents": {
          "source": "http://tftpserveripaddress/scripts/examplescript.sh"
        },
        "mode": 493
      }
    ],
    "filesystems": [
      {
        "device": "/dev/disk/by-partlabel/var",
        "format": "ext4",
        "label": "var",
        "path": "/var",
        "wipeFilesystem": true
      },
      {
        "device": "/dev/disk/by-partlabel/swap",
        "format": "swap",
        "wipeFilesystem": true
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "# Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var.service\n\n[Mount]\nWhere=/var\nWhat=/dev/disk/by-partlabel/var\nType=ext4\n\n[Install]\nRequiredBy=local-fs.target",
        "enabled": true,
        "name": "var.mount"
      },
      {
        "contents": "# Generated by Butane\n[Swap]\nWhat=/dev/disk/by-partlabel/swap\n\n[Install]\nRequiredBy=swap.target",
        "enabled": true,
        "name": "dev-disk-by\\x2dpartlabel-swap.swap"
      }
    ]
  }
}

Other Information

systemctl logs:

systemUser@localhost:~$ sudo systemctl status systemd-userdbd.service
× systemd-userdbd.service - User Database Manager
     Loaded: loaded (/usr/lib/systemd/system/systemd-userdbd.service; indirect; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Wed 2024-04-17 15:02:50 UTC; 6min ago
TriggeredBy: × systemd-userdbd.socket
       Docs: man:systemd-userdbd.service(8)
    Process: 1256 ExecStart=/usr/lib/systemd/systemd-userdbd (code=exited, status=226/NAMESPACE)
   Main PID: 1256 (code=exited, status=226/NAMESPACE)
        CPU: 11ms

Apr 17 15:02:50 localhost.localdomain systemd[1]: Starting systemd-userdbd.service - User Database Manager...
Apr 17 15:02:50 localhost.localdomain (-userdbd)[1256]: systemd-userdbd.service: Failed to set up mount namespacing: /run/systemd/mount-rootfs/dev: Read-only file system
Apr 17 15:02:50 localhost.localdomain (-userdbd)[1256]: systemd-userdbd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-userdbd: Read-only file system
Apr 17 15:02:50 localhost.localdomain systemd[1]: systemd-userdbd.service: Main process exited, code=exited, status=226/NAMESPACE
Apr 17 15:02:50 localhost.localdomain systemd[1]: systemd-userdbd.service: Failed with result 'exit-code'.
Apr 17 15:02:50 localhost.localdomain systemd[1]: Failed to start systemd-userdbd.service - User Database Manager.
Apr 17 15:02:50 localhost.localdomain systemd[1]: systemd-userdbd.service: Start request repeated too quickly.
Apr 17 15:02:50 localhost.localdomain systemd[1]: systemd-userdbd.service: Failed with result 'exit-code'.
Apr 17 15:02:50 localhost.localdomain systemd[1]: Failed to start systemd-userdbd.service - User Database Manager.
systemUser@localhost:~$ sudo systemctl status systemd-userdbd.socket
× systemd-userdbd.socket - User Database Manager Socket
     Loaded: loaded (/usr/lib/systemd/system/systemd-userdbd.socket; enabled; preset: enabled)
     Active: failed (Result: service-start-limit-hit) since Wed 2024-04-17 15:02:50 UTC; 6min ago
   Duration: 1.021s
   Triggers: ● systemd-userdbd.service
       Docs: man:systemd-userdbd.service(8)
     Listen: /run/systemd/userdb/io.systemd.Multiplexer (Stream)
jlebon commented 2 months ago

First, I commend you on your mermaid skills. :)

That said, I think this is probably a dupe of https://github.com/coreos/fedora-coreos-tracker/issues/1296. According to the discussions there, the issue is fixed in the next stream. Can you try the next ISO and see if you still hit the issue?

robbycuenot commented 2 months ago

First, I commend you on your mermaid skills. :)

That said, I think this is probably a dupe of coreos/fedora-coreos-tracker#1296. According to the discussions there, the issue is fixed in the next stream. Can you try the next ISO and see if you still hit the issue?

ChatGPT did the leg-work on the mermaid ;) Thanks, I'll give this a shot!

robbycuenot commented 2 months ago

That fixed it! Closing this