coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

Surprising behavior trying to manually change kargs while an update is already queued #963

Open betermieux opened 3 years ago

betermieux commented 3 years ago

Describe the bug I want to switch to cgroups v2 by removing systemd.unified_cgroup_hierarchy from the kernel arguments. While the rpm-ostree command executes succesfully, the removal of the kernel argument is not propagated, and in grub I can see systemd.unified_cgroup_hierarchy=0 again. Any ideas where to look for details?

Reproduction steps

  1. sudo rpm-ostree kargs --delete=systemd.unified_cgroup_hierarchy --reboot

System details

lucab commented 3 years ago

Thanks for the report. Is this an old installed systemd? 34.20210821.1.1 should already come without that kernel argument.

Can you please post the output of:

betermieux commented 3 years ago

Well yes, it is a node I installed in July 2020, auto-updating on the next stream (stream shouldn't matter, because I have also nodes on stable, which show the same behaviour). I have also included the log output of rpm-ostreed after executing rpm-ostree kargs

# cat /sysroot/.coreos-aleph-version.json
{
    "build": "32.20200629.3.0",
    "ref": "fedora/x86_64/coreos/stable",
    "ostree-commit": "6df95bdb2fe2d36e091d4d18e3844fa84ce4b80ea3bd0947db5d7a286ff41890",
    "imgid": "fedora-coreos-32.20200629.3.0-qemu.x86_64.qcow2"
}

# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt1)/ostree/fedora-coreos-daacce5351564f0d1ffe9898a69968b7e62b73500667f39a3ec4042b84ce6bd6/vmlinuz-5.13.12-200.fc34.x86_64 mitigations=auto,nosmt systemd.unified_cgroup_hierarchy=0 console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.0/fedora-coreos/daacce5351564f0d1ffe9898a69968b7e62b73500667f39a3ec4042b84ce6bd6/0 ignition.platform.id=vmware

# sudo journalctl -u rpm-ostreed.service
-- Journal begins at Thu 2021-09-16 07:14:02 UTC, ends at Thu 2021-09-16 12:12:11 UTC. --
-- No entries --

# sudo rpm-ostree kargs --delete=systemd.unified_cgroup_hierarchy
Staging deployment... done
Kernel arguments updated.
Run "systemctl reboot" to start a reboot

# sudo journalctl -u rpm-ostreed.service
-- Journal begins at Thu 2021-09-16 07:14:02 UTC, ends at Thu 2021-09-16 12:14:07 UTC. --
Sep 16 12:13:54 docker1 systemd[1]: Starting rpm-ostree System Management Daemon...
Sep 16 12:13:54 docker1 rpm-ostree[324341]: Reading config file '/etc/rpm-ostreed.conf'
Sep 16 12:13:56 docker1 rpm-ostree[324341]: In idle state; will auto-exit in 60 seconds
Sep 16 12:13:56 docker1 systemd[1]: Started rpm-ostree System Management Daemon.
Sep 16 12:13:56 docker1 rpm-ostree[324341]: client(id:cli dbus:1.865 unit:session-4.scope uid:0) added; new total=1
Sep 16 12:13:57 docker1 rpm-ostree[324341]: Locked sysroot
Sep 16 12:13:57 docker1 rpm-ostree[324341]: Initiated txn KernelArgs for client(id:cli dbus:1.865 unit:session-4.scope uid:0): /org/projectatomic/rpmostree1/fedora_coreos
Sep 16 12:13:57 docker1 rpm-ostree[324341]: Process [pid: 324339 uid: 0 unit: session-4.scope] connected to transaction progress
Sep 16 12:13:58 docker1 rpm-ostree[324341]: note: Deploying commit 7ea8d089d135c027ff98e911dd43fc1886d223e6694f2cc21637645ee42eca37 which contains content in /var/lib that will be ignored.
Sep 16 12:13:59 docker1 rpm-ostree[324341]: Created new deployment /ostree/deploy/fedora-coreos/deploy/7ea8d089d135c027ff98e911dd43fc1886d223e6694f2cc21637645ee42eca37.11
Sep 16 12:13:59 docker1 rpm-ostree[324341]: sanitycheck(/usr/bin/true) successful
Sep 16 12:14:00 docker1 rpm-ostree[324341]: Txn KernelArgs on /org/projectatomic/rpmostree1/fedora_coreos successful
Sep 16 12:14:03 docker1 rpm-ostree[324341]: Unlocked sysroot
Sep 16 12:14:03 docker1 rpm-ostree[324341]: Process [pid: 324339 uid: 0 unit: session-4.scope] disconnected from transaction progress
Sep 16 12:14:04 docker1 rpm-ostree[324341]: client(id:cli dbus:1.865 unit:session-4.scope uid:0) vanished; remaining=0
Sep 16 12:14:04 docker1 rpm-ostree[324341]: In idle state; will auto-exit in 63 seconds

After rebooting systemd.unified_cgroup_hierarchy=0 is still present in grub menu and /proc/cmdline

betermieux commented 3 years ago

Maybe I found the culprit, I have specified a periodic update window later this week. New kernel arguments are probably added to the not yet used version. Any chance to force an upgrade? A regular systemctl reboot just returns back to 34.20210821.1.1

# cat /etc/zincati/config.d/55-updates-strategy.toml
[updates]
strategy = "periodic"
[[updates.periodic.window]]
days = [ "Fri", "Sat" ]
start_time = "20:00"
length_minutes = 720

# rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; update staged: 34.20210904.1.0; reboot pending due to update strategy
Deployments:
  fedora:fedora/x86_64/coreos/next
                   Version: 34.20210904.1.0 (2021-09-07T00:13:14Z)
                    Commit: 7ea8d089d135c027ff98e911dd43fc1886d223e6694f2cc21637645ee42eca37
              GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39
                      Diff: 34 upgraded, 1 removed

* fedora:fedora/x86_64/coreos/next
                   Version: 34.20210821.1.1 (2021-08-24T03:31:02Z)
                    Commit: 55e40560b1a4008f8d4fd70eb73d65cc834ca03a985fb380e76c9ae2c2c459fa
              GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39

  fedora:fedora/x86_64/coreos/next
                   Version: 34.20210808.1.0 (2021-08-09T22:57:42Z)
                    Commit: 966e80e2789383d8403b46d77c449e1c858e7116e60a6c49da9f84e1c9a95f4d
              GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39

# rpm-ostree upgrade --bypass-driver -r
2 metadata, 0 content objects fetched; 788 B transferred in 1 seconds; 0 bytes content written
No upgrade available.
lucab commented 3 years ago

Not directly yet, but:

# echo 'updates.strategy = "immediate"' > /run/zincati/config.d/99-finalize-once-out-of-maintenance-window.toml
# systemctl restart zincati.service

Will perform an immediate update, once.

Though I'm still not clear about what happens to the new/updated kargs. Possibly they get applied to the staged update, which is however discarded by the manual reboot?

lucab commented 3 years ago

If the above guess is correct, a more reliable way to tweak kargs and immediately apply changes would be:

# systemctl stop zincati.service
# rpm-ostree cleanup -p
# rpm-ostree kargs --delete=systemd.unified_cgroup_hierarchy --reboot

Although if that is the case, we should look for a way to improve the UX of this.

betermieux commented 3 years ago

OK, now it works. rpm-ostree kargs changed the kernel arguments of the staged update (probably worked all the time). After letting zincati upgrade the node with the immediate strategy, the new kernel arguments are used. I wouldn't fix anything, but maybe you should output a warning if rpm-ostree kargs changes kernel arguments of a (not yet) active deployment.

lucab commented 3 years ago

I suspect we can improve the UX of this in two directions:

betermieux commented 3 years ago

I agree with your first case, but I had tested rpm-ostree kargs --reboot, which will reboot the system immediatly. To my knowledge, --bypass-driver is only used for rpm-ostree upgrade not for rpm-ostree kargs . You will always have to point to the update driver if an update is pending.