coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

Failed to write database /etc/udev/hwdb.bin: Read-only file system #822

Open travier opened 3 years ago

travier commented 3 years ago

Describe the bug

Upon reboot from latest update, I have the following failing unit: systemd-hwdb-update.service. I unfortunately did not capture the systemctl status output right away and tried restarting the unit first. I think this is not the first time I have encountered this one but I don't remember more details:

# systemctl restart systemd-hwdb-update.service
# systemctl status systemd-hwdb-update.service 
● systemd-hwdb-update.service - Rebuild Hardware Database
     Loaded: loaded (/usr/lib/systemd/system/systemd-hwdb-update.service; static)
     Active: failed (Result: exit-code) since Fri 2021-05-07 00:07:11 UTC; 12h ago
  Condition: start condition failed at Fri 2021-05-07 12:10:02 UTC; 4s ago
             ├─ ConditionNeedsUpdate=/etc was not met
             └─ ConditionDirectoryNotEmpty=|/etc/udev/hwdb.d was not met
       Docs: man:hwdb(7)
             man:systemd-hwdb(8)
    Process: 591 ExecStart=/usr/bin/systemd-hwdb update (code=exited, status=1/FAILURE)
   Main PID: 591 (code=exited, status=1/FAILURE)
        CPU: 330ms

May 07 00:07:11 fedora-coreos systemd[1]: Starting Rebuild Hardware Database...
May 07 00:07:11 fedora-coreos systemd-hwdb[591]: Failed to write database /etc/udev/hwdb.bin: Read-only file system
May 07 00:07:11 fedora-coreos systemd[1]: systemd-hwdb-update.service: Main process exited, code=exited, status=1/FAILURE
May 07 00:07:11 fedora-coreos systemd[1]: systemd-hwdb-update.service: Failed with result 'exit-code'.
May 07 00:07:11 fedora-coreos systemd[1]: Failed to start Rebuild Hardware Database.
May 07 12:10:02 fedora-coreos systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.

Reproduction steps Unknown

Expected behavior No unit failure

Actual behavior Unit failure

System details

QEMU/KVM (Digital Ocean droplet):

# rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Fri 2021-05-07 12:09:29 UTC)
Deployments:
● ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 33.20210426.3.0 (2021-05-03T22:49:38Z)
                BaseCommit: 49ec34ca76b71283d317025a54fe815422299ced4d035369599539e2665d5818
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31
           LayeredPackages: htop tmux tree vim zsh

  ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 33.20210412.3.0 (2021-04-27T18:27:38Z)
                BaseCommit: 8563aca63f27acdc80c7104486bbc50435dfa47eff28dc7e9209f2496b166076
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31
           LayeredPackages: htop tmux tree vim zsh

Ignition config

This happened after first boot. Initial config was very minimal, I don't think it's related.

Additional information

# ostree admin config-diff | grep bin

This could be a race condition. Or maybe this file does not need to be in /etc at all on FCOS.

jlebon commented 3 years ago

Yeah, this looks like a race between systemd-hwdb-update.service and ostree-remount.service. Does the output of journalctl -u systemd-hwdb-update.service -u ostree-remount.service show that it ran before ostree-remount.service?

This would be fixed by https://github.com/ostreedev/ostree/issues/2115.

travier commented 3 years ago

Indeed, from logs, this happened twice recently:

Apr 02 10:48:25 fedora-coreos systemd[1]: Starting Rebuild Hardware Database...
Apr 02 10:48:25 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Apr 02 10:48:26 fedora-coreos systemd-hwdb[590]: Failed to write database /etc/udev/hwdb.bin: Read-only file system
Apr 02 10:48:26 fedora-coreos ostree-remount[596]: Remounted ro: /sysroot
Apr 02 10:48:26 fedora-coreos ostree-remount[596]: Remounted rw: /etc
Apr 02 10:48:26 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
Apr 02 10:48:26 fedora-coreos systemd[1]: systemd-hwdb-update.service: Main process exited, code=exited, status=1/FAILURE
Apr 02 10:48:26 fedora-coreos systemd[1]: systemd-hwdb-update.service: Failed with result 'exit-code'.
Apr 02 10:48:26 fedora-coreos systemd[1]: Failed to start Rebuild Hardware Database.
Apr 02 10:53:25 fedora-coreos systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Apr 02 10:54:42 fedora-coreos systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
-- Reboot --
Apr 02 22:43:36 fedora-coreos systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Apr 02 22:43:36 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Apr 02 22:43:36 fedora-coreos ostree-remount[596]: Remounted ro: /sysroot
Apr 02 22:43:36 fedora-coreos ostree-remount[596]: Remounted rw: /etc
Apr 02 22:43:36 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
-- Reboot --
Apr 06 23:16:04 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Apr 06 23:16:05 fedora-coreos ostree-remount[597]: Remounted ro: /sysroot
Apr 06 23:16:05 fedora-coreos ostree-remount[597]: Remounted rw: /etc
Apr 06 23:16:05 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
-- Reboot --
Apr 16 00:40:22 fedora-coreos systemd[1]: Starting Rebuild Hardware Database...
Apr 16 00:40:22 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Apr 16 00:40:23 fedora-coreos ostree-remount[598]: Remounted ro: /sysroot
Apr 16 00:40:23 fedora-coreos ostree-remount[598]: Remounted rw: /etc
Apr 16 00:40:23 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
Apr 16 00:40:23 fedora-coreos systemd[1]: Finished Rebuild Hardware Database.
Apr 22 11:02:47 fedora-coreos systemd[1]: systemd-hwdb-update.service: Succeeded.
Apr 22 11:02:47 fedora-coreos systemd[1]: Stopped Rebuild Hardware Database.
-- Reboot --
Apr 22 11:02:56 fedora-coreos systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Apr 22 11:02:56 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Apr 22 11:02:56 fedora-coreos ostree-remount[596]: Remounted ro: /sysroot
Apr 22 11:02:56 fedora-coreos ostree-remount[596]: Remounted rw: /etc
Apr 22 11:02:56 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
-- Reboot --
Apr 30 01:36:15 fedora-coreos systemd[1]: Starting Rebuild Hardware Database...
Apr 30 01:36:15 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Apr 30 01:36:15 fedora-coreos ostree-remount[597]: Remounted ro: /sysroot
Apr 30 01:36:15 fedora-coreos ostree-remount[597]: Remounted rw: /etc
Apr 30 01:36:15 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
Apr 30 01:36:16 fedora-coreos systemd[1]: Finished Rebuild Hardware Database.
May 03 07:58:43 fedora-coreos systemd[1]: systemd-hwdb-update.service: Succeeded.
May 03 07:58:43 fedora-coreos systemd[1]: Stopped Rebuild Hardware Database.
-- Reboot --
May 03 07:58:53 fedora-coreos systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
May 03 07:58:53 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
May 03 07:58:54 fedora-coreos ostree-remount[594]: Remounted ro: /sysroot
May 03 07:58:54 fedora-coreos ostree-remount[594]: Remounted rw: /etc
May 03 07:58:54 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
-- Reboot --
May 07 00:07:11 fedora-coreos systemd[1]: Starting Rebuild Hardware Database...
May 07 00:07:11 fedora-coreos systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
May 07 00:07:11 fedora-coreos systemd-hwdb[591]: Failed to write database /etc/udev/hwdb.bin: Read-only file system
May 07 00:07:11 fedora-coreos systemd[1]: systemd-hwdb-update.service: Main process exited, code=exited, status=1/FAILURE
May 07 00:07:11 fedora-coreos systemd[1]: systemd-hwdb-update.service: Failed with result 'exit-code'.
May 07 00:07:11 fedora-coreos systemd[1]: Failed to start Rebuild Hardware Database.
May 07 00:07:12 fedora-coreos ostree-remount[597]: Remounted ro: /sysroot
May 07 00:07:12 fedora-coreos ostree-remount[597]: Remounted rw: /etc
May 07 00:07:12 fedora-coreos systemd[1]: Finished OSTree Remount OS/ Bind Mounts.
May 07 12:10:02 fedora-coreos systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped
dustymabe commented 2 years ago

I'm going to close this out since no one has reported it lately. If anyone has seen this lately or thinks it should remain open, please re-open.

lucab commented 2 years ago

I think we actually fixed this long time ago in https://github.com/ostreedev/ostree/releases/tag/v2021.6.

nikhiljha commented 1 year ago

I just ran into this on Fedora CoreOS 36.20221001.3.0 (lightly edited to remove irrelevant loglines and replace hostname):

[core@x ~]$ systemctl status systemd-hwdb-update.service
× systemd-hwdb-update.service - Rebuild Hardware Database
     Loaded: loaded (/usr/lib/systemd/system/systemd-hwdb-update.service; static)
     Active: failed (Result: exit-code) since Tue 2022-10-25 03:57:06 UTC; 3h 8min ago
       Docs: man:hwdb(7)
             man:systemd-hwdb(8)
    Process: 1098 ExecStart=systemd-hwdb update (code=exited, status=1/FAILURE)
   Main PID: 1098 (code=exited, status=1/FAILURE)
        CPU: 391ms

Oct 25 03:57:06 x systemd[1]: Starting systemd-hwdb-update.service - Rebuild Hardware Database...
Oct 25 03:57:06 x systemd-hwdb[1098]: mmap: Permission denied
Oct 25 03:57:06 x systemd-hwdb[1098]: Failed to write database /etc/udev/hwdb.bin: Permission denied
Oct 25 03:57:06 x systemd[1]: systemd-hwdb-update.service: Main process exited, code=exited, status=1/FAILURE
Oct 25 03:57:06 x systemd[1]: systemd-hwdb-update.service: Failed with result 'exit-code'.
Oct 25 03:57:06 x systemd[1]: Failed to start systemd-hwdb-update.service - Rebuild Hardware Database.
[core@x ~]$ journalctl -u systemd-hwdb-update.service -u ostree-remount.service
... snip repeated lines from above ...
Oct 25 03:57:09 x systemd[1]: Starting ostree-remount.service - OSTree Remount OS/ Bind Mounts...
Oct 25 03:57:09 x systemd[1]: Finished ostree-remount.service - OSTree Remount OS/ Bind Mounts.

And yeah, ostree-remount happened after :\

lucab commented 1 year ago

@nikhiljha I'm not sure you are hitting the same bug, in your case it's a "Permission denied". Looking at the logs, it looks like an mmap() is getting an EPERM at some point. Do you maybe have some kind of local SELinux modifications? Can you reproduce this on a pristine/minimal machine?

nikhiljha commented 1 year ago

@lucab I'll give a shot at trying to reproduce this later today, but all I did was PXE boot with a (pretty basic?) ignition config and then run sudo reboot. Here's my ignition config if it's relevant.

Ignition Config (Click to Expand) # vim: set ft=yaml: variant: fcos version: 1.4.0 kernel_arguments: should_exist: - "bond=bond0:eno1np0,eno2np1:mode=802.3ad,miimon=100" - "ip=:::255.255.255.0::bond0:none:1.1.1.1" systemd: units: - name: ocf-rpm-ostree-install.service enabled: true contents: | [Unit] Description=Install OCF Kubernetes packages Wants=network-online.target After=network-online.target # We run before `zincati.service` to avoid conflicting rpm-ostree # transactions. Before=zincati.service ConditionPathExists=!/var/lib/%N.stamp [Service] Type=oneshot RemainAfterExit=yes # `--allow-inactive` ensures that rpm-ostree does not return an error # if the package is already installed. This is useful if the package is # added to the root image in a future Fedora CoreOS release as it will # prevent the service from failing. ExecStart=/usr/bin/rpm-ostree install --apply-live --allow-inactive vim setroubleshoot kubelet kubeadm kubectl cri-o git ExecStart=/bin/touch /var/lib/%N.stamp [Install] WantedBy=multi-user.target - name: ocf-start-kubernetes.target enabled: true contents: | [Unit] Description=Start OCF kubernetes Wants=ocf-rpm-ostree-install.service After=ocf-rpm-ostree-install.service Wants=crio.service Wants=kubelet.service [Install] WantedBy=multi-user.target storage: files: # vim as default EDITOR - path: /etc/profile.d/zz-default-editor.sh overwrite: true contents: inline: | export EDITOR=vim # CRI-O - path: /etc/dnf/modules.d/cri-o.module mode: 0644 overwrite: true contents: inline: | [cri-o] name=cri-o stream=1.25 profiles= state=enabled # YUM repository for kubeadm, kubelet and kubectl # TODO: Enable gpgcheck once dnf bug is fixed. - path: /etc/yum.repos.d/kubernetes.repo mode: 0644 overwrite: true contents: inline: | [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg # configuring automatic loading of br_netfilter on startup - path: /etc/modules-load.d/br_netfilter.conf mode: 0644 overwrite: true contents: inline: br_netfilter # setting kernel parameters required by kubelet - path: /etc/sysctl.d/kubernetes.conf mode: 0644 overwrite: true contents: inline: | net.bridge.bridge-nf-call-iptables=1 net.ipv4.ip_forward=1 # hostname - path: /etc/hostname mode: 0644 contents: inline: x passwd: users: - name: ssh_authorized_keys: -