Flesh out and document target UX

jlebon commented 4 years ago

I think it's a useful exercise to early on flesh out what the UX will look like. Let's discuss that here and then add something in the README?

Some bootstrapping questions:

how does one perform a manual upgrade?
how do automated systems perform upgrades?
how does this tie into rpm-ostree status on rpm-ostree-based systems?

We don't need to answer everything completely, but discussing these will make it easier to think about how bootupd fits in.

cgwalters commented 4 years ago

how does one perform a manual upgrade? how do automated systems perform upgrades?

The questions presuppose that upgrades aren't on by default which...hasn't been decided I'd say. It might be that e.g. FCOS ships with bootupd on by default.

how does this tie into rpm-ostree status on rpm-ostree-based systems?

Should it? Should rpm-ostree status include e.g. any status from dbxtool.service too? I like the idea of a "one pane of glass" but it also introduces some potential confusion if admins start to think they're actually linked.

cgwalters commented 4 years ago

I'm currently leaning towards having no updates by default and also documenting how one can use a container to orchestrate bootupd.

jlebon commented 4 years ago

WDYT about the discussions in https://github.com/coreos/fedora-coreos-tracker/issues/510#issuecomment-669331994? It seems like for EFI at least, it seems possible to make updates quite safe. In which case, it might be worthwhile to just always update to simplify the model and maintenance. (Obviously doesn't help BIOS of course).

jlebon commented 4 years ago

Should it? Should rpm-ostree status include e.g. any status from dbxtool.service too? I like the idea of a "one pane of glass" but it also introduces some potential confusion if admins start to think they're actually linked.

I think it would be really useful for rpm-ostree status -v to print the OSTree commit and version from which the currently installed bootloader comes from. I could take a look at that assuming that information is currently being stored in /boot by bootupd.

Edit: thinking more on this... on the fence as well I think. I think it makes sense to have it in status -v if we implement "always update" via rpm-ostree calling out to bootupd.

cgwalters commented 4 years ago

OK so clearly we want things to be configurable. There's the question of the default, but I think what we basically want is to support:

Support not enabling any updates by default (status quo)
Update by default when ostree updates (systemctl enable bootupd-automatic.service e.g.) - basically when we boot into the new ostree we update the bootloader. Now clearly it would be (potentially) better to update the bootloader before rebooting, i.e. hook into the rpm-ostree process and scrape out the updates but...eh. The thing is one doesn't really need to reboot after updating the bootloader other than to test it .
Be scriptable by an external agent (machine-config-operator, gnome-software); we have bootupctl status --json and then one can use bootupctl update e.g.

So for FCOS I think my vote would be off by default, it's trivial today to enable a systemd unit via Ignition/fcct so if we ship bootupd-automatic.service that should be fine.

Now a general concern here is people running clusters will want to avoid the possibility of bricking multiple servers at once. For the MCO case that should already happen anyways if we enable bootupd-automatic.service as part of OKD/OCP - the MCO's role here might just be notifying/logging that the bootloader was updated?

That said there is still the overall concern that ostree updates are transactional, bootloader updates aren't - some admins may want to schedule the latter separately and be prepared for recovery in the (unlikely but possible) event things go wrong.

lucab commented 4 years ago

Sorry for the late feedback, I also have some doubts on the UX, especially regarding auto-updates.

Now clearly it would be (potentially) better to update the bootloader before rebooting, i.e. hook into the rpm-ostree process and scrape out the updates but...eh.

This would be a sweet-spot in terms of tackling updates cluster-wide, because otherwise a bootloader update requires two reboots: the first to have the ostree content available and the second to actually use the new bootloader.

The thing is one doesn't really need to reboot after updating the bootloader other than to test it.

From ContainerLinux experience, this a ticking bomb with a deferred explosion triggered by any reboot, which is better to avoid. It's problematic because an external unplanned event (kernel crash, power glitch, VM restart, etc) may activate an update at the worst possible time, possible compounding on other troubles and making root-cause analysis way messier. The current rpm-ostree approach of locked finalization (i.e. with a final apply&reboot atomic action) is a better model.

jlebon commented 4 years ago

Now clearly it would be (potentially) better to update the bootloader before rebooting, i.e. hook into the rpm-ostree process and scrape out the updates but...eh.

This would be a sweet-spot in terms of tackling updates cluster-wide, because otherwise a bootloader update requires two reboots: the first to have the ostree content available and the second to actually use the new bootloader.

Agreed. Another reason is that doing it pre-reboot you find out immediately if the bootloader update breaks your boot and so the rollout stops on the first machine instead of bricking your whole cluster.

That said, I don't want to go back to https://github.com/coreos/rpm-ostree/pull/1882. I'd much prefer for tighter integration between rpm-ostree and bootupd, which I think then meshes well with having it in status -v as mentioned above? The update policy itself could still live out-of-band though; e.g. rpm-ostree would just tell bootupd "hey I just deployed this pending commit, do what you will".

This does go counter though to the "offline background updates" story. But there is only one bootloader, so there can never really be "offline updates" in the same way (though see https://github.com/coreos/bootupd/issues/8#issuecomment-697412444).

jlebon commented 4 years ago

That said, I don't want to go back to coreos/rpm-ostree#1882. I'd much prefer for tighter integration between rpm-ostree and bootupd, which I think then meshes well with having it in status -v as mentioned above? The update policy itself could still live out-of-band though; e.g. rpm-ostree would just tell bootupd "hey I just deployed this pending commit, do what you will".

Hmm actually, maybe a more correct way to do this is to integrate at the finalization stage just like ostree-finalize-staged.service does. We can make the ostree API to hook into that more official and have bootupd fire during finalization? That way, we're also sure that we're updating /boot with the bits we're rebooting into. And it feels more "background updates"-ish than doing it at staging time because rebooting into a new update is a bit like permission to mutate state. It also addresses better @lucab's concerns re. locking in https://github.com/coreos/bootupd/issues/8#issuecomment-708375600 too.

cgwalters commented 4 years ago

In practice today I think two things are true:

We will ship bootloader updates rarely
Bootloader updates are unlikely to break things

Given this, a simple systemd unit like this:

[Unit]
Description=Bootupd automatic update
[Service]
ExecStart=/usr/bin/bootupctl update
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

is going to be fine for many people to start, or they could just do it manually.

Now, I do agree with the concerns above. I filed that as https://github.com/coreos/bootupd/issues/108

coreos / bootupd

Flesh out and document target UX #8