fedora-iot / iot-distro

Issue tracking for the Fedora IoT Edition
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Writing symlinks under /etc/systemd/system in Fedora IoT 39 apparently doesn't survive the reboot. #14

Open miabbott opened 1 year ago

miabbott commented 1 year ago

(Migrated from https://github.com/coreos/fedora-coreos-tracker/issues/1615 -> https://pagure.io/fedora-iot/issue/53 -> to here)

Oringally reported by @gabrieleturchi

Describe the bug

Apparently any symlink created or removed under /etc/systemd/system (like "systemctl disable ModemManager" or "systemctl set-default graphical.target" - I haven't checked under other /etc folders) is not kept for the next reboot. Creating a file (like a new service) works. I'm pretty sure that worked well under Fedora IoT 38.

My current workaround in fact is to make a service who force the right target and usind predefined services configuration to enable my new services.

Of course I installed xfce before to run “systemctl set-default graphical.target”.

GT

Reproduction steps

  1. Install graphical environment
  2. systemctl set-default graphical.target

Expected behavior

Graphical environment running at boot

Actual behavior

multi-user text environment running at boot

System details

Raspberry Pi 3

miabbott commented 1 year ago

I have not tested the installation of the graphical environment, but I did test removing and adding symlinks in a F39 IoT VM on x86_64:

[core@localhost ~]$ rpm-ostree status
State: idle
Deployments:
● fedora-iot:fedora/stable/x86_64/iot
                  Version: 39.20231116.0 (2023-11-16T16:19:42Z)
                   Commit: caf769dfb42e2db90ee99cf378bea9bcc92382c72c415fe72d81cddd52390c01
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C

  fedora-iot:fedora/devel/x86_64/iot
                  Version: 39.20231026.0 (2023-10-26T12:27:40Z)
                   Commit: 0599c27fe88ed2aaeb8144c7b604aaa69e31a94cbc384c894e29b27a077bdb6a
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C
[core@localhost ~]$ systemctl status ModemManager.service 
● ModemManager.service - Modem Manager
     Loaded: loaded (/usr/lib/systemd/system/ModemManager.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Fri 2023-11-17 14:40:29 EST; 32s ago
   Main PID: 797 (ModemManager)
      Tasks: 4 (limit: 2208)
     Memory: 6.4M
        CPU: 41ms
     CGroup: /system.slice/ModemManager.service
             └─797 /usr/sbin/ModemManager

Nov 17 14:40:29 localhost systemd[1]: Starting ModemManager.service - Modem Manager...
Nov 17 14:40:29 localhost ModemManager[797]: <info>  ModemManager (version 1.20.6-3.fc39) starting in system bus...
Nov 17 14:40:29 localhost systemd[1]: Started ModemManager.service - Modem Manager.
Nov 17 14:40:31 localhost.localdomain ModemManager[797]: <info>  [base-manager] couldn't check support for device '/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0': not supported by any plugin
[core@localhost ~]$ sudo systemctl disable ModemManager.service --now
[sudo] password for core: 
Removed "/etc/systemd/system/dbus-org.freedesktop.ModemManager1.service".
Removed "/etc/systemd/system/multi-user.target.wants/ModemManager.service".
[core@localhost ~]$ ls -l /etc/systemd/system/
total 28
drwxr-xr-x. 1 root root  58 Nov 17 14:09 boot-complete.target.requires
drwxr-xr-x. 1 root root  48 Nov 17 14:09 cryptsetup.target.wants
lrwxrwxrwx. 1 root root  37 Nov 17 14:09 ctrl-alt-del.target -> /usr/lib/systemd/system/reboot.target
lrwxrwxrwx. 1 root root  41 Nov 17 14:09 dbus-org.fedoraproject.FirewallD1.service -> /usr/lib/systemd/system/firewalld.service
lrwxrwxrwx. 1 root root  45 Nov 17 14:09 dbus-org.freedesktop.home1.service -> /usr/lib/systemd/system/systemd-homed.service
lrwxrwxrwx. 1 root root  57 Nov 17 14:09 dbus-org.freedesktop.nm-dispatcher.service -> /usr/lib/systemd/system/NetworkManager-dispatcher.service
lrwxrwxrwx. 1 root root  44 Nov 17 14:09 dbus-org.freedesktop.oom1.service -> /usr/lib/systemd/system/systemd-oomd.service
lrwxrwxrwx. 1 root root  48 Nov 17 14:09 dbus-org.freedesktop.resolve1.service -> /usr/lib/systemd/system/systemd-resolved.service
lrwxrwxrwx. 1 root root  43 Nov 17 14:09 dbus.service -> /usr/lib/systemd/system/dbus-broker.service
drwxr-xr-x. 1 root root  28 Nov 17 14:09 default.target.wants
drwxr-xr-x. 1 root root  36 Nov 17 14:09 getty.target.wants
drwxr-xr-x. 1 root root  98 Nov 17 14:09 greenboot-healthcheck.service.requires
drwxr-xr-x. 1 root root  44 Nov 17 14:09 local-fs.target.wants
drwxr-xr-x. 1 root root 888 Nov 17 14:41 multi-user.target.wants
drwxr-xr-x. 1 root root  68 Nov 17 14:09 network-online.target.wants
drwxr-xr-x. 1 root root  70 Nov 17 14:09 ostree-finalize-staged.service.requires
drwxr-xr-x. 1 root root  66 Nov 17 14:09 reboot.target.wants
drwxr-xr-x. 1 root root  54 Nov 17 14:09 redboot.target.requires
drwxr-xr-x. 1 root root  54 Nov 17 14:09 redboot.target.wants
drwxr-xr-x. 1 root root 154 Nov 17 14:09 sockets.target.wants
drwxr-xr-x. 1 root root 302 Nov 17 14:09 sysinit.target.wants
drwxr-xr-x. 1 root root  62 Nov 17 14:09 system-update.target.wants
drwxr-xr-x. 1 root root  60 Nov 17 14:09 systemd-homed.service.wants
drwxr-xr-x. 1 root root  58 Nov 17 14:09 systemd-journald.service.wants
drwxr-xr-x. 1 root root  38 Nov 17 14:09 timers.target.wants
[core@localhost ~]$ sudo ln -s /etc/passwd /etc/foobar
[core@localhost ~]$ ls -l /etc/foobar 
lrwxrwxrwx. 1 root root 11 Nov 17 14:42 /etc/foobar -> /etc/passwd
[core@localhost ~]$ sudo systemctl reboot

Broadcast message from root@localhost on pts/1 (Fri 2023-11-17 14:42:23 EST):

The system will reboot now!

Connection to 192.168.122.49 closed by remote host.
Connection to 192.168.122.49 closed.

...

$ sshq -l core 192.168.122.49
Warning: Permanently added '192.168.122.49' (ED25519) to the list of known hosts.
Script '01_update_platforms_check.sh' FAILURE (exit code '1'). Continuing...
Boot Status is GREEN - Health Check SUCCESS
Last login: Fri Nov 17 14:40:37 2023 from 192.168.122.1
[core@localhost ~]$ sudo journalctl --list-boot
[sudo] password for core: 
Sorry, try again.
[sudo] password for core: 
IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY                 
 -7 ee60e4892f5c45ee99260776fc1ed598 Mon 2023-09-11 15:14:31 EDT Tue 2023-09-12 08:50:29 EDT
 -6 2c93c96efa5f4049ab656a56fba413c2 Mon 2023-09-18 14:40:17 EDT Mon 2023-09-18 15:13:05 EDT
 -5 4ef9fb3545214900a5400498f1417fec Tue 2023-10-03 16:00:21 EDT Tue 2023-10-03 16:05:58 EDT
 -4 c8b395fcc3a24f558d7ddff88475b885 Fri 2023-10-27 09:07:12 EDT Fri 2023-10-27 09:20:55 EDT
 -3 25173f4cdde64fe195ede0992a07e1e4 Fri 2023-10-27 09:21:11 EDT Fri 2023-10-27 15:44:48 EDT
 -2 3450521b10ee47fe8f935f3fe3301bbb Fri 2023-11-17 13:29:20 EST Fri 2023-11-17 14:40:09 EST
 -1 4644ace61bef4825ae6b99eed0ab9edb Fri 2023-11-17 14:40:27 EST Fri 2023-11-17 14:42:24 EST
  0 9d6d0f6ab850490c9057a9c83bd90ec2 Fri 2023-11-17 14:42:38 EST Fri 2023-11-17 14:43:36 EST
[core@localhost ~]$ ls -l /etc/systemd/system/
total 28
drwxr-xr-x. 1 root root  58 Nov 17 14:09 boot-complete.target.requires
drwxr-xr-x. 1 root root  48 Nov 17 14:09 cryptsetup.target.wants
lrwxrwxrwx. 1 root root  37 Nov 17 14:09 ctrl-alt-del.target -> /usr/lib/systemd/system/reboot.target
lrwxrwxrwx. 1 root root  41 Nov 17 14:09 dbus-org.fedoraproject.FirewallD1.service -> /usr/lib/systemd/system/firewalld.service
lrwxrwxrwx. 1 root root  45 Nov 17 14:09 dbus-org.freedesktop.home1.service -> /usr/lib/systemd/system/systemd-homed.service
lrwxrwxrwx. 1 root root  57 Nov 17 14:09 dbus-org.freedesktop.nm-dispatcher.service -> /usr/lib/systemd/system/NetworkManager-dispatcher.service
lrwxrwxrwx. 1 root root  44 Nov 17 14:09 dbus-org.freedesktop.oom1.service -> /usr/lib/systemd/system/systemd-oomd.service
lrwxrwxrwx. 1 root root  48 Nov 17 14:09 dbus-org.freedesktop.resolve1.service -> /usr/lib/systemd/system/systemd-resolved.service
lrwxrwxrwx. 1 root root  43 Nov 17 14:09 dbus.service -> /usr/lib/systemd/system/dbus-broker.service
drwxr-xr-x. 1 root root  28 Nov 17 14:09 default.target.wants
drwxr-xr-x. 1 root root  36 Nov 17 14:09 getty.target.wants
drwxr-xr-x. 1 root root  98 Nov 17 14:09 greenboot-healthcheck.service.requires
drwxr-xr-x. 1 root root  44 Nov 17 14:09 local-fs.target.wants
drwxr-xr-x. 1 root root 888 Nov 17 14:41 multi-user.target.wants
drwxr-xr-x. 1 root root  68 Nov 17 14:09 network-online.target.wants
drwxr-xr-x. 1 root root  70 Nov 17 14:09 ostree-finalize-staged.service.requires
drwxr-xr-x. 1 root root  66 Nov 17 14:09 reboot.target.wants
drwxr-xr-x. 1 root root  54 Nov 17 14:09 redboot.target.requires
drwxr-xr-x. 1 root root  54 Nov 17 14:09 redboot.target.wants
drwxr-xr-x. 1 root root 154 Nov 17 14:09 sockets.target.wants
drwxr-xr-x. 1 root root 302 Nov 17 14:09 sysinit.target.wants
drwxr-xr-x. 1 root root  62 Nov 17 14:09 system-update.target.wants
drwxr-xr-x. 1 root root  60 Nov 17 14:09 systemd-homed.service.wants
drwxr-xr-x. 1 root root  58 Nov 17 14:09 systemd-journald.service.wants
drwxr-xr-x. 1 root root  38 Nov 17 14:09 timers.target.wants
[core@localhost ~]$ ls -l /etc/foobar 
lrwxrwxrwx. 1 root root 11 Nov 17 14:42 /etc/foobar -> /etc/passwd
[core@localhost ~]$ rpm-ostree status
State: idle
Deployments:
● fedora-iot:fedora/stable/x86_64/iot
                  Version: 39.20231116.0 (2023-11-16T16:19:42Z)
                   Commit: caf769dfb42e2db90ee99cf378bea9bcc92382c72c415fe72d81cddd52390c01
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C

  fedora-iot:fedora/devel/x86_64/iot
                  Version: 39.20231026.0 (2023-10-26T12:27:40Z)
                   Commit: 0599c27fe88ed2aaeb8144c7b604aaa69e31a94cbc384c894e29b27a077bdb6a
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C

[core@localhost ~]$ systemctl status ModemManager.service 
○ ModemManager.service - Modem Manager
     Loaded: loaded (/usr/lib/systemd/system/ModemManager.service; disabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: inactive (dead)
pcdubs commented 1 year ago

We test enabling/disabling services in openqa:

https://openqa.fedoraproject.org/tests/2270264

(test details - https://pagure.io/fedora-qa/os-autoinst-distri-fedora/blob/main/f/tests/base_service_manipulation.pm )

gabrieleturchi commented 1 year ago

Tomorrow I plan to repeat the process step-by-step, checking if and when the symlinking starts to fail. But, there could be a reason somewhere for this strange behaviour, keeping the files and discarding changes on symlinks?

gabrieleturchi commented 1 year ago

Mhhh.... Some quick test... Building "normal" symlinks (using "ln -s" command directly) appear to work, both in /etc and in /etc/systemd/system. But systemd links, like in "systemctl set-default graphical.target", disappears between reboots.

miabbott commented 1 year ago

I tried the systemctl set-default graphical.target and it worked here. I don't have any desktop environment installed, so that is still a big caveat.

[core@localhost ~]$ sudo systemctl set-default graphical.target
[sudo] password for core: 
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/graphical.target.
[core@localhost ~]$ ls -l /etc/systemd/system/default.target
lrwxrwxrwx. 1 root root 40 Nov 17 16:59 /etc/systemd/system/default.target -> /usr/lib/systemd/system/graphical.target
[core@localhost ~]$ sudo systemctl reboot

Broadcast message from root@localhost on pts/1 (Fri 2023-11-17 17:00:05 EST):

The system will reboot now!

[core@localhost ~]$ Connection to 192.168.122.49 closed by remote host.
Connection to 192.168.122.49 closed.
[miabbott@toolbox (container) ~/workspaces/fedora/fedoraproject.org (iot_issue)]$ sshq -l core 192.168.122.49
Warning: Permanently added '192.168.122.49' (ED25519) to the list of known hosts.
GREENBOOT is currently performing the provided checks.
Log in again in a minute or check /run/motd.d/boot-status to know the result of the checks
Last login: Fri Nov 17 16:58:49 2023 from 192.168.122.1
[core@localhost ~]$ ls -l /etc/systemd/system/default.target
lrwxrwxrwx. 1 root root 40 Nov 17 16:59 /etc/systemd/system/default.target -> /usr/lib/systemd/system/graphical.target
[core@localhost ~]$ rpm-ostree status
State: idle
Deployments:
● fedora-iot:fedora/stable/x86_64/iot
                  Version: 39.20231116.0 (2023-11-16T16:19:42Z)
                   Commit: caf769dfb42e2db90ee99cf378bea9bcc92382c72c415fe72d81cddd52390c01
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C

  fedora-iot:fedora/devel/x86_64/iot
                  Version: 39.20231026.0 (2023-10-26T12:27:40Z)
                   Commit: 0599c27fe88ed2aaeb8144c7b604aaa69e31a94cbc384c894e29b27a077bdb6a
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C
gabrieleturchi commented 1 year ago

Ehm... I have a problem...

?????

# rpm-ostree status
State: idle
Deployments:
● fedora-iot:fedora/stable/aarch64/iot
                  Version: 39.20231103.1 (2023-11-03T18:17:43Z)
                   Commit: cc8d419f72d84ac24d0a95e235c0cdf72844f73d9d6f42a41fcddf23dfb34f7d
             GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
miabbott commented 10 months ago

@gabrieleturchi Is this still a problem you are experiencing? If so, could you provide the contents of the journal after you disable ModemManager and then after the reboot?

wurstsemmel commented 9 months ago

Hi, I want to make you aware that in my case systemctl enable rpm-ostreed-automatic.timer is not persistent. Another user reported that none of the systemd services can get enabled persistently [1]. I did not try with any other service / timer. systemctl --user is fine.

I installed Fedora 39 IoT on Raspberry Pi 4B (aarch64) using arm-image-installer without making use of zezere [2].

I filed a bug against systemd which was recently re-assigned to zezere [3].

Maybe @miabbott can have a look to [3]? It would be great.

[1] https://discussion.fedoraproject.org/t/fedora-39-iot-edition-systemctl-enable-rpm-ostreed-automatic-timer-disabled-after-reboot/101194 [2] https://www.redhat.com/sysadmin/fedora-iot-raspberry-pi [3] https://bugzilla.redhat.com/show_bug.cgi?id=2259451

miabbott commented 9 months ago

Thanks for the ping, @wurstsemmel

The comment from Zbigniew on the linked BZ is a good hint that gives us more to investigate:

> Jan 29 21:34:10 rpi4 zezere-ignition[1323]: WARNING  : files: createResultFile: Ignition has already run on this system. Unexpected behavior may occur. Ignition is not designed to run more than once per system.

So it seems that the systems gets "reinitialized". I have no idea why.
Can ignition reset systemctl enablement symlinks?

I'll reassign this to zezere-ignition for comments.

So it may be that the zezere-ignition integration is breaking the enabling/disabling of systemd units/targets.

A good experiment would be to systemctl mask the zezere services and then see if systemctl disable/enable persists.

(Possibly related https://github.com/fedora-iot/zezere/issues/137; it's about spamming the journal, but I think the root cause is zezere keeps firing when it shouldn't.)

pcdubs commented 9 months ago

This seems to only affect the disk images produced in osbuild(?). Older disk images (F37 upgraded) work as expected, and installations from iso also work.

wurstsemmel commented 9 months ago

@paulwhalen I used the .iso in Gnome Boxes (x86_64) and I do not observe the issue. In contrast, using the .raw.xz on my Raspberry Pi 4B systemctl disable does not work. That seems to underline your assumption.

$ arm-image-installer --help

Usage: arm-image-installer <options>

    --image=IMAGE   - xz compressed image file name

It looks like I cannot use arm-image-installer to install the .iso in order to verify that the issue depends on which variant (.iso or .raw.xz) I use.

wurstsemmel commented 9 months ago

A good experiment would be to systemctl mask the zezere services and then see if systemctl disable/enable persists.

@miabbott Thanks for the good idea.

Identify and stop / disable / mask the zezere unit files:

# systemctl list-unit-files --all | grep zezere
zezere_ignition.service                           static          -
zezere_ignition_banner.service                    static          -
zezere_ignition.timer                             disabled        enabled

# systemctl stop zezere_ignition.timer
# systemctl disable zezere_ignition.timer
# systemctl mask zezere_ignition.timer
Created symlink /etc/systemd/system/zezere_ignition.timer → /dev/null.

# systemctl stop zezere_ignition.service
# systemctl mask zezere_ignition.service
Created symlink /etc/systemd/system/zezere_ignition.service → /dev/null.

# systemctl stop zezere_ignition_banner.service
# systemctl mask zezere_ignition_banner.service
Created symlink /etc/systemd/system/zezere_ignition_banner.service → /dev/null.

Enable the timer:

# systemctl enable rpm-ostreed-automatic.timer 
Created symlink /etc/systemd/system/timers.target.wants/rpm-ostreed-automatic.timer → /usr/lib/systemd/system/rpm-ostreed-automatic.timer.

Reboot. Verify that zezere is not active during boot:

# journalctl -b > boot.log

I cannot find any entry for zezere in boot.log, suggesting that zezere was disabled successfully.

# systemctl is-enabled rpm-ostreed-automatic.timer 
disabled

Unfortunately the timer is disabled again after reboot although zezere seems to be successfully deactivated.

The hint that the issue is related to the type of image (.iso or .raw.xz) is more primising, but I do not know how to take it from here. Any idea?

achilleas-k commented 9 months ago

@runcom Is this possibly related to the ignition firstboot kernel options we add to IoT disk images in osbuild?

dougstanley commented 9 months ago

I believe I confirmed @achilleas-k suspicion. I used rpm-ostree kargs --editor and removed the $ignition_firstboot option and systemctl enable, etc began working as expected after a reboot.

Also, like the others, I was using fedora-iot 39 aarch64 (on a raspberry pi 3) running on an sd card created with arm-image-installer.

achilleas-k commented 9 months ago

Right, good, so do they need to be removed after first boot somehow?

runcom commented 9 months ago

iot is very likely messing up with https://github.com/fedora-iot/ignition-edge/blob/main/systemd/ignition-firstboot-complete.service and https://github.com/osbuild/osbuild/blob/main/stages/org.osbuild.grub2#L285 (notice, I don't think iot is using the files/services I linked but that's the way to enable and then disable ignition on first boot and if iot isn't doing that, it's probably just missing them or they're wrong somehow)

wurstsemmel commented 9 months ago

Removing $ignition_firstboot with rpm-ostree kargs --editor also brings back the full functionality of systemctl enable to me! A big thanks to all of you!

BreiteSeite commented 7 months ago

FYI i ran into the same issue.

I installed Fedora IoT 39 (freshly) on a raspberrypi 4.

I also used arm-image-installer with the --addkey parameter (because i have IPv6) and the documentation states:

Please note, there is a known issue with IPv6 and provisioning with Zezere. If your network uses IPv6, please use the arm-image-installer to copy your ssh public key to the image.

I removed the kargs via

rpm-ostree kargs --delete-if-present='$ignition_firstboot'

If you're using ansible, you can use this in your playbook:

    # https://github.com/fedora-iot/iot-distro/issues/14
    - name: Fix Fedora bug where systemd looses it's configuration after reboot
      ansible.builtin.shell: rpm-ostree kargs --delete-if-present='$ignition_firstboot' --unchanged-exit-77
      register: kargs_result
      changed_when: kargs_result.rc != 77
      failed_when: kargs_result.rc != 0 and kargs_result.rc != 77

What is the issue to follow to get notified once the root cause is resolved? Is it https://bugzilla.redhat.com/show_bug.cgi?id=2259451?

InternetOfTofu commented 2 months ago

I hit this too when I tried to install tailscale on one of my SBCs. Thanks for the community that have a solution already.

chrismaster commented 22 hours ago

Same on Fedora IOT 41 Wanted to disable zezere, but after every reboot the timer was enabled again...