coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

Complete composefs integration in Fedora CoreOS #1718

Open travier opened 2 months ago

travier commented 2 months ago

Describe the enhancement

We should complete the integration of composefs in Fedora CoreOS. Composefs brings better security and potentially better performance (no need to create a deploy anymore).

Support for composefs already partially landed in ostree and coreos-assembler. Now we need to figure out the missing pieces, the integration and testing.

System details

All

Additional information

See:

jlebon commented 2 months ago

Also see https://github.com/coreos/fedora-coreos-config/pull/2856. I think essentially we need to pick up that PR and push it through to get it running in rawhide at least. But note https://github.com/coreos/fedora-coreos-config/pull/2856#issuecomment-2010283749:

A thing that this is known to break is the "chattr -i" hack for new toplevel dirs (xref https://github.com/coreos/rpm-ostree/issues/337 )

What I don't know offhand is how problematic that will be in reality. My instinct says we can do this, but with a sufficiently loud announcement.

But note in the context of bootable containers, you can create whatever top-level directory you'd like at derivation time. The tricky bit is if you need to create dynamically named top-level directories. In that case... those use cases might need a tmpfs overlay on top (e.g. the rootfs.transient work, though ideally you could create the overlay, add your directories and then change the overlay back to read-only, but that'd need to be done from the initramfs I think).

travier commented 2 months ago

Some edited notes from the meeting:

Composefs is a combination of filesystems (overlayfs, erofs) that creates an ostree like system without some of the downsides of ostree.

It also provides runtime integrity checks where ostree "only" provides offline integrity checks.

Combined with podman / container storage support, you can even gain disk space and memory de-duplication if your containers use the same binaries as your system. If we share the ostree/composefs repo with the podman system container store then we can de-dup the files with the same hash.

It also has potential (small) performance improvements for deployments/updates as you don't need to create an entire filesystem tree for /usr anymore for each version as those are fully stored as EROFS images. Ostree creates a hardlink farm for each deployment, but you can not hardlink folders and symlinks so you need to create the entire tree. With composefs, the filesystem hierarchy (mostly paths and metadata) is stored in the EROFS filesystem, which is a read only FS stored in a file. The file content remain in the ostree repo / composefs store. Compoefs uses overlayfs to reference the files stored in the ostree repo, via the filesystem stored as EROFS.

In the first phase AIUI, ostree will still create deployments anyway, but we would dynamically create the composefs image at boot.

jbtrystram commented 1 month ago

Writing up some investigation I did today around signing the ostree commit with a key that we embed into initramfs to leverage composeFS validation.

We need to sign the ostree commit with an Ed25519 key, likely during the Build OSTree stage of the pipeline. At the moment i don't think we do that, looking at coreos-assembler/src/cmd-build. Then we can embed the pubkey in initramfs and wire up things.

I initially thought that would interfere with the robosignatory step but after writing this up I don't think so. Still worth asking :)

https://ostreedev.github.io/ostree/composefs/#signatures

travier commented 1 month ago

https://github.com/ostreedev/ostree/discussions/3256

jbtrystram commented 1 month ago

initial experiment : https://github.com/coreos/coreos-assembler/pull/3813

travier commented 4 weeks ago

Draft change in https://fedoraproject.org/wiki/Changes/ComposefsAtomicCoreOSIoT

jlebon commented 6 days ago

We discussed this in today's community meeting. One thing that came up was about how to roll this out to existing nodes.

Currently, as it's enabled via a /usr dropin, it would affect existing nodes and new installs simultaneously. If those could be decoupled, then we could e.g. let it bake even in stable for a month or two on new installs before enabling it for upgrading nodes.

Regardless of the approach, once this hits next, this should be part of our communications to raise awareness and invite testing.