Open jlebon opened 6 months ago
To start the conversation, we should probably list some of the advantages and disadvantages of this. Will add some tomorrow.
So I think there are two primary arguments for supporting an Anaconda flow:
And here are three arguments against:
The obvious one is that it breaks the cloud/bare metal symmetry we've upheld so far. That property has had drawbacks. For example:
But it also provides benefits:
/var
on a separate volume).Not sure to well understand everything you did write, my biggest concern is about metal live image (pxe boot) that let me consider FCOS like a disposable container that is a new born child (that can remember its previous life in /var ^^ ) at every reboot. What about Anaconda and live OS ? Do I need to worry about it ?
There are no plans currently to stop supporting live PXE with persistent /var
.
A metal disk image means we can pretty much guarantee that the disk will boot. It's built exactly like we build the rest of our cloud images and we can subject it to lots of CI that carries more assurances than CI at the Anaconda level.
One thing this definitely relates to is https://github.com/rhinstaller/anaconda/discussions/5197 - which would lead more closely to a world where even if anaconda is being used, if your kickstart is just the bootc
verb, anaconda is doing very little other than being a live ISO running podman
effectively, which runs the target container image, i.e. the mkfs.xfs
that is used lives in that container, not anaconda. This type of stuff adds reproducibility.
The trickier balance is around enabling more complex anaconda features like LVM and RAID1 while still deferring as closely as possible to the container.
I'm going to reply to this comment here since I think it belongs better here and that thread overall is about more than just Anaconda.
One topic threading through this is whether we aim to support the the "dd raw disk image to metal" for bootc installs. I personally am trying hard to back away from that because of my experience from the CoreOS side is that while it covers extremely well the 85% cases, and causes deep problems in the harder ones like iSCSI, multipath, etc.
Actually, thinking on this, neither iSCSI nor multipath are good examples. In both cases, whole block devices are involved, so there isn't really a mismatch with the image model. And on the configuration side, we're just using the same kargs and dracut code as in traditional systems. The only tricky bit is adapting our initramfs to not get in the way of existing functionality. Another way to say this is that both iSCSI and multipath are expected to be set up at installation time and require no involvement from e.g. Ignition.
RAID1, LUKS, and LVM are good examples that clash with the disk image model. Perhaps for RAID1 we could've done it at coreos-installer time to make the initramfs code simpler (though we already had the logic there for root reprovisioning, which was the bulk of complexity add). LUKS is an interesting case, because doing it from the initramfs means you can easily use it in all image-based platforms too (e.g. QEMU, OpenStack, but I've also seen people use NBDE in e.g. Azure).
While the root reprovisioning code in the initramfs is filesystem-based, (1) we benefit from that code being under our control, and (2) we benefit from starting from a known state. That said, one thing that'd help a lot I think is lifting all that stuff out of bash scripts and into e.g. rdcore
(and on that topic, maybe we can try to share code with bootc's takeover install support).
All this to say that while it has its implementation warts, I think we've done quite well with the metal image overall and I'm not quite convinced it's time to move away from it. (I haven't talked at all about the UX here, which is a huge part of the story of course.)
Interested to hear what others think!
Actually, thinking on this, neither iSCSI nor multipath are good examples. In both cases, whole block devices are involved, so there isn't really a mismatch with the image model
(disk image)
The only tricky bit is adapting our initramfs to not get in the way of existing functionality.
Right...but these are related because if we do installation via filesystem layout and not disk images, then we don't have a complex initramfs.
LUKS is an interesting case, because doing it from the initramfs means you can easily use it in all image-based platforms too (e.g. QEMU, OpenStack, but I've also seen people use NBDE in e.g. Azure).
The way I want to push this going forward is there are two paths:
While the root reprovisioning code in the initramfs is filesystem-based, (1) we benefit from that code being under our control, and (2) we benefit from starting from a known state.
I think what bootc install to-disk
for example is also quite highly opinionated and under the OS writer's control - now to-filesystem
adds a good bit more flexibility, but there's still a lot that is tested as a unit.
I think we've done quite well with the metal image overall and I'm not quite convinced it's time to move away from it.
In practice, we're clearly not going to just drop it in the near future. It's a question of emphasis though, once there are new alternatives.
- Generating custom pre-built disk images from a container alongside desired storage layout; this can be set up for LUKS or LVM-and-LUKS etc. as desired without doing a complex dance in the initramfs
In the disk image case, LUKS requires re-provisioning in all cases, either when using null-cypher
or current Ignition style re-provisionning.
- takeover installs where we treat the entire booted OS as just an initramfs; instead of carrying provisioning tools in the initramfs, one can use the full power of anything you have in a custom container image. That path can clearly support LUKS in the cloud equally well, albeit with an extra reboot or at least an extra userspace-only restart. But I think that's really fine because people doing that stuff are going for longer-lived instances.
We can boot the system fully in RAM with a karg that I don't remember. Maybe the path forward is to move Ignition to using this mode for the first boot when doing complex filesystem re-provisioning instead of having all the logic in the initramfs. This would let us keep a "fast-path" for non-complex-storage Ignition configs while doing more complex storage setups not in the initramfs.
In the disk image case, LUKS requires re-provisioning in all cases, either when using null-cypher or current Ignition style re-provisionning.
No, one can do online incremental re-encryption. That's not "re-provisioning" - it's a cheap operation that doesn't require moving the OS to RAM.
The reason we stopped doing that for the original RHCOS case is that it's a magical special case and we wanted to support generalized other partitioning too (including e.g. LUKS-on-RAID, switching filesystems etc.) in a consistent fashion.
But for the cloud case, it makes total sense to generate a cloud disk image that has a LUKS layout and then on firstboot start a cryptsetup-reencrypt operation (which could bind to the machine local tpm2, or more complex things).
Baremetal anaconda style installs can just set up the desired partitioning directly (as can also be done in a basic setup via bootc install
).
We can boot the system fully in RAM with a karg that I don't remember. Maybe the path forward is to move Ignition to using this mode for the first boot when doing complex filesystem re-provisioning instead of having all the logic in the initramfs. This would let us keep a "fast-path" for non-complex-storage Ignition configs while doing more complex storage setups not in the initramfs.
Er...if we're running the root from RAM how would in-place OS updates and in general persistent data work?
Stated simply: if we ship a first-class flow for the combination of three cases:
Then Ignition partitioning isn't necessary. The users who want to boot as quickly as possible in the cloud can choose 1; those who are OK with an extra reboot for longer-lived instances and don't want to maintain disk images can choose 2.
To be clear though, in a way...because Ignition is such a central API to what FCOS is today, the above is more "container/bootc based Fedora", not FCOS. But then it also does make sense to think about crossover/intersection between the two, such as generating a FCOS-derived container and making a disk image from it, but still using Ignition at runtime for some configuration, etc.
Anaconda nowadays supports installing bootc-compatible container images with the
ostreecontainer
keyword. In the future, it may directly use bootc instead to carry out the installation (see https://github.com/rhinstaller/anaconda/discussions/5197).Currently, there is work underway to have osbuild-based tooling to generate Anaconda ISOs with the bootable container embedded which will carry out the installation: https://github.com/osbuild/bootc-image-builder/pull/58.
On the CoreOS side, we notably decided very early on to provide a disk image-based flow in the bare metal case so that it closely resembles the cloud case.
With other bootable container variants likely eventually supporting an Anaconda flow, we should consider whether this is something we also want to support in CoreOS.