containers / bootc

Boot and upgrade via container images
https://containers.github.io/bootc/
Apache License 2.0
781 stars 84 forks source link

install: Spike on working unprivileged #859

Open cgwalters opened 3 weeks ago

cgwalters commented 3 weeks ago

The need for install to-filesystem|to-disk to operate privileged has come up in a few contexts, most recently in https://github.com/osbuild/bootc-image-builder/issues/98#issuecomment-1989213198

The mkfs.ext4|xfs|etc tools support a -d <root> to create filesystems unprivileged. However...the annoying problem here is that handling things like uid/gid and selinux labels unprivileged gets hard.

One hack I was thinking of here is...maybe we could experiment in with something like using fuse to create a mocked up root. IIRC OpenEmbedded has a LD_PRELOAD thing to intercept syscalls, which is pretty hacky but probably works.

What'd obviously be nicer is if these tools all took something like a composefs-style dumpfile as input. But I bet the fuse thing would work.

cgwalters commented 3 weeks ago

One thing this will also help is avoiding the need for the host kernel to support a specific filesystem type (e.g. rhel kernels don't include btrfs).

cgwalters commented 3 weeks ago

One thing that came up related to this in a side chat is that while tooling exists to do "simple partition" setup and basic filesystem population, more complex storage (such as LVM) aren't yet ready to be initialized in this way.

I think at a basic level if we show the value in this outside of LVM, that provides motivation to make it work there.

The second thing is: I believe that for truly complex storage it makes sense to split it up into two parts:

This is a quite important topic because whether partitioning/filesystem setup is defined in the OS or external to it has implications for e.g. factory reset.

I think for LVM for example, if the user wants something like / to be a 100G VG, and /var/lib/postgres to be a 1T VG, tools that are generating disk images should actually just setup the basic / VG, and synthesize systemd units that on firstboot initialize the /var/lib/postgres VG. (This whole topic of course snowballs fast into "declarative state" and not defining things via imperative mutation, such as Ansible playbooks which aim to do that for LVM, etc.)