Support executing `install to-filesystem` from host directly

stefwalter commented 7 months ago

As someone who used the workflow of "install to-filesystem" to get a system compatible with bootable containers. I used the following command (successfully):

$ sudo yum install -y podman
$ sudo podman run --rm --privileged --pid=host --security-opt label=type:unconfined_t -v /:/target 
    -v /var/lib/containers:/var/lib/containers quay.io/swalter/monday:1.0 bootc 
    install to-filesystem --karg=console=ttyS0,115200n8 --replace=alongside /target
$ sudo reboot

This command is not discoverable. I would have expected the to use the bootc command directly, or have a much simpler podman command.

stefwalter commented 7 months ago

The command is documented here: https://containers.github.io/bootc/bootc-install.html#executing-bootc-install

Nevertheless, if this is an entry point to bootable containers, it seems sufficiently complex to be a hurdle for adoption.

cgwalters commented 7 months ago

My hope was that most people started from automation to do this, including spinning up the instance and injecting this data - that's what https://github.com/vrothberg/bootc-playground/tree/main/alongside does.

Nevertheless, we absolutely could support bootc install --from=quay.io/exampleos/someos:latest or so which would do the pull and handle all the podman invocations for you.

cgwalters commented 7 months ago

I was thinking about this more, and a way we could streamline this might just be having our container re-spawn itself instead with more mounts.

Basically I bet we can boil this down to:

podman run --rm --privileged -v /run:/run quay.io/exampleos/image install to-existing-root <extra args>
And then we basically re-spawn ourselves on the host by using the systemd socket and spawning a new systemd unit for the install that re-runs that exact same container image, but with more arguments like the required --security-opt label=type:unconfined_t -v /var/lib/containers:/var/lib/containers etc.

That would kill off the need for --pid=host --security-opt label=type:unconfined_t -v /var/lib/containers:/var/lib/containers which would probably be a notable enough win to do.

EDIT: Yep after a quick verification this causes the sleep command to run in the host context:

podman run --privileged -v /run:/run --rm -ti quay.io/centos-bootc/centos-bootc:stream9 systemd-run -PG sleep 1h

Just...a tricky thing here is that in order to have ctrl-c work we'll want to bind the spawned systemd unit (and/or its spawned container) to the lifetime of the invoking image.

(That said of course none of the install process is transactional so ctrl-c generally leaves things broken in the to-existing-root path)

cgwalters commented 7 months ago

(That said of course none of the install process is transactional so ctrl-c generally leaves things broken in the to-existing-root path)

.oO(except, we could probably rework things such that we only blow away /boot at the end and not the start...)

achilleas-k commented 1 month ago

I'm curious to understand the technical reasons for needing to run bootc from inside the container/podman environment to begin with. My (admittedly limited) understanding is that when running install to-filesystem with an empty target (as opposed to the scenario where it's taking over an existing system), bootc sets up the filesystem with the content from the source container, takes care of selinux labelling, and sets up the bootloader.

If this specific scenario could be untangled from podman, it would simplify things a lot for how we do disk image building in bootc-image-builder. It would also make things simpler for integrating BIB functionality into the Image Builder service as well.

cgwalters commented 1 month ago

In the bigger picture, one goal I have here with bootc is to keep the installation logic centralized as much as possible - to make the container image the "source of truth", the "center of gravity". If I want to change how the bootloader gets installed, I should be able to change the container image, not an external tool. I'd like the container to be an active participant in its installation, not an inert blob (as we were treating ostree before, and rpm...well, is a mix). This also helps ensure that "day 1" and "day 2" are the same.

A notable pivot point here is e.g. "what version of mkfs.xfs is used"? I'd like that to default to coming from the container image. There are of course use cases for very small systems which may not want to ship those, and those systems can be installed not from the image itself - we clearly need to handle that.

I think it's a question of defaults/emphasis.

Running inside podman today of course means that "day 1" and "day 2" are different...but still, there's a whole lot of code that gets shared.

If this specific scenario could be untangled from podman, it would simplify things a lot for how we do disk image building in bootc-image-builder. It would also make things simpler for integrating BIB functionality into the Image Builder service as well.

It already is, that's what --source-imageref is doing right? But still though, what's the problem with running under podman (or an equivalent container runtime setup in the same way)? It'd be nice for us to not have to debug arbitrarily different container environment setups...(for example, that different environment was the cause of https://github.com/containers/bootc/pull/790#pullrequestreview-2311808299 )

achilleas-k commented 1 month ago

In the bigger picture, one goal I have here with bootc is to keep the installation logic centralized as much as possible - to make the container image the "source of truth", the "center of gravity". If I want to change how the bootloader gets installed, I should be able to change the container image, not an external tool. I'd like the container to be an active participant in its installation, not an inert blob (as we were treating ostree before, and rpm...well, is a mix). This also helps ensure that "day 1" and "day 2" are the same.

That's perfectly understandable and I wouldn't want that principle to be violated or bent. I suppose the biggest issue here would be version and feature drift between the bootc binary on the host and the one used to build and included in the container. But that could be solved with feature compatibility checks and API versions. Perhaps it's not worth the effort though.

A notable pivot point here is e.g. "what version of mkfs.xfs is used"? I'd like that to default to coming from the container image. There are of course use cases for very small systems which may not want to ship those, and those systems can be installed not from the image itself - we clearly need to handle that.

This is indeed a big issue, one I ran into myself when setting up a filesystem on a Fedora host and installing a CentOS Stream 9 container to it. An ext4 filesystem created on Fedora wont boot on a C9S kernel unless the orphan-file feature is disabled. But with install to-filesystem, bootc clearly supports the use case of setting up a filesystem with tooling that might be different from the ones inside the container, since the filesystem needs to exist ahead of install time.

I think it's a question of defaults/emphasis.

Running inside podman today of course means that "day 1" and "day 2" are different...but still, there's a whole lot of code that gets shared.

If this specific scenario could be untangled from podman, it would simplify things a lot for how we do disk image building in bootc-image-builder. It would also make things simpler for integrating BIB functionality into the Image Builder service as well.

It already is, that's what --source-imageref is doing right? But still though, what's the problem with running under podman (or an equivalent container runtime setup in the same way)? It'd be nice for us to not have to debug arbitrarily different container environment setups...(for example, that different environment was the cause of #790 (review) )

I think it makes sense for a tool like bootc to expect certain things from its environment, but tying it to a specific container runtime is what I find strange. I understand that being able to make assumptions about the environment based on a known state (a known runtime) simplifies things greatly, but being able to set up an environment the same way without tying it to specific tooling means we can be more flexible with integrating things into our services.

Actually, while writing this (and reading back through PRs and issues), I think I might be working under outdated information (or purely incorrect assumptions).

I considered not posting this comment at all, or at least the last parts, since I'm almost certain now that I'm asking for something that already exists, but I'll keep it for future reference.

cgwalters commented 1 month ago

But with install to-filesystem, bootc clearly supports the use case of setting up a filesystem with tooling that might be different from the ones inside the container, since the filesystem needs to exist ahead of install time.

Yes, but I'd also say that even if you use install to-filesystem I would still encourage using tooling shipped from the container image - which is what we're doing today with bib (but not with Anaconda, which I'd like to fix, though it'd have the cost of needing to spool the container image to RAM/swap first in the PXE/remote case).

I think it makes sense for a tool like bootc to expect certain things from its environment, but tying it to a specific container runtime is what I find strange.

It's fair, and in the general case we're not strictly tied to podman per se (one can make alternative runtimes that use the same libraries, and I believe it'd work when executed from cri-o, although the use cases for that are probably obscure).

However...we are increasingly tied somewhat to podman when logically bound images are scoped in, and there's a strong need for us to converge and align things there (ref https://github.com/containers/bootc/issues/20 ) so while we could relatively easily decouple things a bit from the container environment executing the install, at a pratical level there's still that tie.

But yes: I suspect it would not be really hard for us to support being installed from docker for example - we can already fetch images from docker-daemon:// transport and that's mostly what's needed here. If someone showed up and wanted to make it happen we could.

I considered not posting this comment at all, or at least the last parts, since I'm almost certain now that I'm asking for something that already exists, but I'll keep it for future reference.

Part of this does already exist in --source-imageref as mentioned above. But, we don't have an equivalent of that for LBIs and that is going to be an increasingly important case, so we'd need to somehow generalize it into a "redirect map" for how to find our source containers to install.

containers / bootc

Support executing `install to-filesystem` from host directly #433