Random thought: two composefs output formats

allisonkarlitskaya commented 1 day ago

@cgwalters mentioned something in a meeting today that I wasn't properly thinking about before.

Long story short: by doing the selinux label rewriting which we need to do as installing an unsealed/modified container image, we're effectively already creating a non-normative form of the container image that is only used for booting the system, and not from inside of containers running on the system.

If we're doing that anyway, then to this non-normative boot-only composefs we could make another tweak: remove /boot from the image before we create it, replacing it with an empty directory. That works around our UKI hash recursion issue without getting into weird layer hiding tricks.

Of course, if we're talking about sealed images, then probably we already precomputed our selinux labels already and wrote them into the tar stream and then wrote the "one true" composefs fsverity digest into a label on the image. In that case our hands are substantially more tied. But maybe this idea of splitting out /boot makes sense anyway. We already need to handle /boot resources specially on install, and in the UKI case we have a UEFI signature on the kernel binary (which, in turn, is also signed as part of the overall container image) which becomes the real trust chain for the booted system anyway (ie: we don't check the OCI label on boot). So maybe having a composefs with /boot stripped from it still makes sense even for fully-sealed images.

allisonkarlitskaya commented 1 day ago

The obvious advantage of this: we could stop thinking about all of these artifact/hidden-composefs-meta-layer hacks. The boot resources would just be in /boot in BLS type 1 or type 2.

cgwalters commented 1 day ago

In current bootc standards we don't have anything in /boot - it's the job of the thing deploying the image to copy any data necessary in /boot into the target (a mix of bootc (really libostree) for the kernel, and bootupd for other stuff) and I think this is right. This stuff may be on separate partitions and most importantly if we want to have more than one bootable container installed (and we clearly do) then what's in /boot in the container can't be a source of truth. systemd-boot for example already does this right, the binaries live in /usr and bootctl install copies them and upgrades them etc.

cgwalters commented 1 day ago

If we're doing that anyway, then to this non-normative boot-only composefs we could make another tweak: remove /boot from the image before we create it, replacing it with an empty directory. That works around our UKI hash recursion issue without getting into weird layer hiding tricks.

I am not quite following; we need to "physically" ship the kernel in the container image so it's downloaded to the client system right?

Isn't this issue just overall a duplicate of https://github.com/containers/composefs-rs/issues/21 ?

allisonkarlitskaya commented 1 day ago

This issue is definitely highly related to #21 but sort of a different direction. I mostly filed it because I wanted to write it down as I was flying out the door so I didn't forget :)

containers / composefs-rs

Random thought: two composefs output formats #35