coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

Create tooling to take a FCOS-derived container and generate bootable images from it #1151

Open cgwalters opened 2 years ago

cgwalters commented 2 years ago

This is touched on in https://github.com/coreos/enhancements/blob/main/os/coreos-layering.md#use-of-coreos-diskboot-images

Basically I think our default stance is that admins use our golden images + ignition to target their custom containers, but I think for a variety of reasons we will want to support generating disk images directly too.

I think this would be relatively maintainable for bare metal targets in particular (ISO, PXE etc.), and it's on bare metal targets where there's more of a desire for things like handling the first boot with an out of tree NIC driver, etc.

Now, obviously this is basically what coreos-assembler is today. However, a few key points:

This build tool could derive from the coreos-assembler container literally, or we could split out the bits we need as a shared library or so (could make sense as a Go library actually, even if we happen to ship shell scripts and python in it?). Starting out this way is going to be the most supportable thing. This tool should be runnable as a container image (that yes, accepts a container image as input) by default, but we could support other flows too.

That said, we should also do a spike on having e.g. RHEL Image Builder be able to output an ISO at least.

cgwalters commented 1 year ago

I briefly looked at trying to have a new tool that "restamps" an existing live ISO with new content by scraping out the files, but what makes this hard right now is that https://github.com/coreos/coreos-assembler/blob/main/src/cmd-buildextend-live has a whole lot of load-bearing magic around things like the kernel arguments embed area and architecture specifics. I am pretty sure it'd be possible to write a tool which basically just dropped in a new squashfs into an existing ISO and kept everything else about it. That'd allow overriding the rootfs...but would break when trying to override the kernel, which is what most use cases for this want anyways, so probably not worth trying.

Further, that code specifically also reads src/config/live which is content outside of the container image today. We'd have to switch to embedding that content inside the target image, similar to what we did with https://github.com/coreos/coreos-assembler/commit/816ebaed49d3e1efb2ba06e52969453ddad06d61

cgwalters commented 1 year ago

Been thinking about this more. How about this plan:

ybettan commented 1 year ago

@cgwalters, Regarding https://github.com/coreos/fedora-coreos-tracker/issues/1151#issuecomment-1351656773:

If we are re-generating the ISO using the coreos-installer then why is it better than using the coreos-assembler for it? I though that the benefit of the installer is not manipulate the ISO without rebuilding it from source each time.

cgwalters commented 1 year ago

My latest work on this is in https://github.com/containers/bootc/pull/30

jmpolom commented 1 year ago

@cgwalters is the thought that having an install command in bootc sidesteps the entire intermediate issue of needing to generate bootable images to install from? Is there any chance this functionality may get added to coreos-installer? I would really like to be able to directly install a customized FCOS build contained in an OCI image.

cgwalters commented 1 year ago

The initial design of coreos-installer was to install raw disk images - with a few very small tweaks like injecting Ignition. Since then, its role has grown significantly. But adding container installation to it would IMO completely change the character of the project.

Further, one wants to use the same technology stack when fetching updates (aside from writing the partitions etc.). This is why I created the new bootc project, which is exactly this.

I would like to add bootc to FCOS, and am working in that direction.

jmpolom commented 1 year ago

@cgwalters that makes a lot of sense. Do you see bootc also sidestepping the need to add functionality to ignition/butane to configure an installed system to boot a containerized build? Such as what is hinted at in the coreos layering enhancements document.

I'm assuming that what is described at the end of that document is not currently available functionality. I did not see it mentioned in the latest butane documentation.

cgwalters commented 1 year ago

Do you see bootc also sidestepping the need to add functionality to ignition/butane to configure an installed system to boot

Yes...and no. Yes, bootc allows directly installing customized container images without an intermediate disk image being generated. But, I think we should still have butane sugar for this to target rebasing an existing FCOS disk image.

cgwalters commented 1 year ago

OK copy-pasting a comment from rpm-ostree here:

We already ship what we need here. I just tried this out:

$ unshare -m   # Avoid leaking mount namespaces
$ coreos-installer install /dev/vdb
Downloading Fedora CoreOS stable x86_64 metal image (raw.xz) and signature
> Read disk 619.1 MiB/619.1 MiB (100%)   
gpg: Signature made Mon 06 Feb 2023 11:11:10 PM UTC
gpg:                using RSA key ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   4  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 4u
gpg: Good signature from "Fedora (37) <fedora-37-primary@fedoraproject.org>" [ultimate]

Note: detected other devices with a filesystem labeled `boot`:
  - /dev/vda2
The installed OS may not work correctly if there are multiple boot filesystems.
Before rebooting, investigate whether these filesystems are needed and consider
wiping them with `wipefs -a`.

Install complete.
$ mount /dev/vdb4 /mnt
$ mount /dev/vdb3 /mnt/boot
$ ostree container image deploy --sysroot=/mnt --stateroot fedora-coreos --imgref ostree-unverified-registry:quay.io/fedora/fedora-coreos:testing-devel
error: Performing deployment: Importing: Unencapsulating base: Layer sha256:d3ad2591239620a23a7bb2d00c33cfce93809cce54260d174bb35ddbdeaa253c: Importing objects: Importing object f9/76104c0916a201c2d6677500bbff574d90bc4d0740d906efc0cc4142d88ddf.file: Processing content object f976104c0916a201c2d6677500bbff574d90bc4d0740d906efc0cc4142d88ddf: Importing regfile: min-free-space-percent '3%' would be exceeded, at least 3.6 MB requested

The error here is very revealing - it's because we want to grow the rootfs on firstboot. Which goes right into the design rationale for bootc install making the filesystems directly.

Adding in a growpart /dev/vdb 4 and xfs_growfs /mnt made things work:

[root@fcloud-dev ~]# growpart /dev/vdb 4
CHANGED: partition=4 start=1050624 old: size=3809247 end=4859871 new: size=40892383 end=41943007
[root@fcloud-dev ~]# xfs_growfs /mnt
meta-data=/dev/vdb4              isize=512    agcount=4, agsize=119039 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1
data     =                       bsize=4096   blocks=476155, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 476155 to 5111547
[root@fcloud-dev ~]# ostree container image deploy --sysroot=/mnt --stateroot fedora-coreos --imgref ostree-unverified-registry:quay.io/fedora/fedora-coreos:testing-devel
The --rebuild-if-modules-changed option is deprecated. Use --refresh instead.
[root@fcloud-dev ~]#

(Then rebooting I'm in a testing-devel FCOS instance)

So.......the question is, short term, do we try to also graft support for this into coreos-installer directly to improve the ergonomics?

For Fedora+derivatives, running this code pulls in rpm-ostree which pulls in a lot of stuff; that we already ship of course on many environments, but probably this should be a "soft" dependency by simply erroring out if the package isn't present. This opens the question a bit around whether the official quay.io/coreos/coreos-installer package gains this dep or not.

In practice I think I'd vote for a soft dep because I generally only care about making this work from inside an existing FCOS-derivative live environment, where we already have the requisite depchain.

The alternative here is to just document things. I'm fine with that too.

cgwalters commented 1 year ago

Also though of note, this flow is only helpful for the scenario where one can get a machine up enough to run this code and write to disk. There are some cases where we can e.g. attach a stock ISO via IPMI or so, then do this installation to disk with the real NIC drivers already embedded etc. Or, plugging in a USB stick (say the FCOS-derivative Live ISO) but also with a data partition containing the target container image.

It'd also be possible, albeit hacky to e.g. inject ignition into our live ISO which downloads and loads the kmod dynamically separately. (Or I guess...assuming the live ISO kernel is the same as the container kernel, could actually extract it from the container)

Hmmm....I wonder though, maybe short term we could try to make some tool which actually does the "data partition with container" thing more streamlined. That would help with a few more cases perhaps?

But ultimately we do need some sort of ergonomic tool which takes a container image as input and spits out an ISO that boots using the modified kernel/OS and can install itself to disk using that target container image.

cgwalters commented 1 year ago

BTW though, there are still a lot of architectural problems with the "hackily update after coreos-installer" approach.

For example, because things like LUKS are handled via Ignition always today, it means that the files we write from the container image are unencrypted. For Fedora CoreOS derivatives, this doesn't matter because everything we ship is FOSS - there is no secret content.

I think that's not going to be true in the general case for custom derived images; I'm pretty sure some users will embed secret data into images (passwords or keys). We can't really say it's a bad idea to do that unless we offer them a significantly more ergnonomic/compelling way to do it. (This does touch on bootc configmap/secret support some, particularly if we do something nice with secrets). I think Ignition has always had this problem, but Ignition configs also happen on firstboot today and the files it writes will happen after LUKS has been enabled.

jmpolom commented 1 year ago

I would say as a downstream user/consumer of whatever functionality is developed, a way to directly install a container image to disk explicitly would be my preferred workflow. I have thought about doing the "post install rebase/deploy hack" but it really does seem like exactly that and I'm apt to avoid such hacks at any scale.

I think having a simple utility without extensive dependencies that could generate the raw bootable disk image (could this step require qemu virtualization? that would be a drag) that could be consumed by coreos-installer as-is would be preferable to me as a user/consumer. I understand not wanting to complicate what should be a simple tool (an installer of raw disk images) that has already evidently experienced scope creep. Expanding the capability and scope of the installer doesn't seem wise.

As for ignition, I think if there's a utility to create disk images and ISOs that can be used by the installer to be written to disk (or booted over the network?!) then there is likely little need for support in ignition. If ignition has to do the rebase or deploy that implies a plain FCOS image is installed, booted, ignition runs and then installs a customized container image. This seems terribly inefficient and convoluted.

Ultimately I think bootc is my true answer here so hopefully that project matures, stabilizes and sees good adoption.

cgwalters commented 1 year ago

Just to xref, there is a new "more incremental" potential path here https://issues.redhat.com/browse/OCPBU-66?focusedId=22251336&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-22251336 that's being discussed in the OCP context but would apply to FCOS too

cgwalters commented 1 year ago

We discovered an ostree bug here: https://github.com/ostreedev/ostree-rs-ext/pull/503

ori-amizur commented 1 year ago

@cgwalters I tried to implement what you advised in this comment, but the flow failed. Here is what I did: The image I used for ostree container image deploy was received using the following command:

oc adm release info --image-for=rhel-coreos quay.io/openshift-release-dev/ocp-release:4.13.3-x86_64

The ostree command was invoked in the following manner:

ostree container image deploy --authfile `pwd`/pull-secret --karg ignition.platform.id=metal --karg ignition.firstboot --sysroot=/mnt  --stateroot rhcos --imgref ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6973e9353f29e678cad79fe768c22cfd6697d8aa30d2aeaa78cceea989925ded

After reboot, it seems there was an attempt to apply ignition, but there were errors, and the console dropped to emergency shell. Here are some errors that appeared in the console:

[    2.501665] systemd[1]: systemd-fsck-root.service: Main process exited, code=killed, status=15/TERM
[  OK  ] Stopped File System Check …f-cd27-4a3f-adf6-c34ac536aeb5.
[    2.504123] systemd[1]: systemd-fsck-root.service: Failed with result 'signal'.
[    2.505161] systemd[1]: Stopped File System Check on /dev/disk/by-uuid/d4eede6f-cd27-4a3f-adf6-c34ac536aeb5.
[    2.510481] ignition-ostree-populate-var[902]: cp: cannot stat '/sysroot/etc/skel/.bash*': No such file or directory
[FAILED] Failed to start Populate OSTree /var.
See 'systemctl status ignition-ostree-populate-var.service' for details.
[  OK  ] Finished Ignition OSTree: …nerate Filesystem UUID (root).
         Starting Ignition OSTree: Grow Root Filesystem...
[FAILED] Failed to start Ignition OSTree: Grow Root Filesystem.
See 'systemctl status ignition-ostree-growfs.service' for details.
         Starting Ignition OSTree: Check Root Filesystem Size...
[FAILED] Failed to start Ignition O…e: Check Root Filesystem Size.
See 'systemctl status ignition-ostree-check-rootfs-size.service' for details.
         Starting Ignition (files)...
[FAILED] Failed to start Ignition (files).
See 'systemctl status ignition-files.service' for details.
Aug 02 12:34:55 ignition[1039]: Ignition failed: failed to create users/groups: failed to configure users: failed to create user "core": exit status 10: Cmd: "useradd" "--root" "/sysroot" "--create-home" "--password" "*" "--comment" "CoreOS Admin" "--groups" "adm,sudo,systemd-journal,wheel" "core" Stdout: "" Stderr: "useradd: cannot lock /etc/group; try again later.
"
Aug 02 12:34:55 systemd[1]: ignition-files.service: Main process exited, code=exited, status=1/FAILURE
Aug 02 12:34:55 systemd[1]: ignition-files.service: Failed with result 'exit-code'.
Aug 02 12:34:55 systemd[1]: Failed to start Ignition (files).
Aug 02 12:34:55 systemd[1]: ignition-files.service: Triggering OnFailure= dependencies.
Aug 02 12:34:55 systemd[1]: Starting Ignition OSTree: Check Root Filesystem Size...
Aug 02 12:34:55 systemd[1]: ignition-ostree-check-rootfs-size.service: Main process exited, code=exited, status=1/FAILURE
Aug 02 12:34:55 systemd[1]: ignition-ostree-check-rootfs-size.service: Failed with result 'exit-code'.
Aug 02 12:34:55 systemd[1]: Failed to start Ignition OSTree: Check Root Filesystem Size.
Aug 02 12:34:55 systemd[1]: Starting Ignition OSTree: Grow Root Filesystem...
Aug 02 12:34:55 systemd[1]: ignition-ostree-growfs.service: Main process exited, code=exited, status=1/FAILURE
Aug 02 12:34:55 systemd[1]: ignition-ostree-growfs.service: Failed with result 'exit-code'.
Aug 02 12:34:55 systemd[1]: Failed to start Ignition OSTree: Grow Root Filesystem.
Aug 02 12:34:55 systemd[1]: Starting Populate OSTree /var...
Aug 02 12:34:55 ignition-ostree-populate-var[902]: cp: cannot stat '/sysroot/etc/skel/.bash*': No such file or directory
Aug 02 12:34:55 systemd[1]: ignition-ostree-populate-var.service: Main process exited, code=exited, status=1/FAILURE
Aug 02 12:34:55 systemd[1]: ignition-ostree-populate-var.service: Failed with result 'exit-code'.
Aug 02 12:34:55 systemd[1]: Failed to start Populate OSTree /var.
Aug 02 12:34:55 systemd[1]: Starting File System Check on /dev/disk/by-uuid/d4eede6f-cd27-4a3f-adf6-c34ac536aeb5...
Aug 02 12:34:55 systemd-fsck[890]: /usr/sbin/fsck.xfs: XFS file system.
Aug 02 12:34:55 systemd[1]: systemd-fsck-root.service: Main process exited, code=killed, status=15/TERM
Aug 02 12:34:55 systemd[1]: systemd-fsck-root.service: Failed with result 'signal'.
Aug 02 12:34:55 systemd[1]: Stopped File System Check on /dev/disk/by-uuid/d4eede6f-cd27-4a3f-adf6-c34ac536aeb5.

What is the reason for the failure? How can it be fixed ?

cgwalters commented 1 year ago

Aug 02 12:34:55 ignition[1039]: Ignition failed: failed to create users/groups: failed to configure users: failed to create user "core": exit status 10: Cmd: "useradd" "--root" "/sysroot" "--create-home" "--password" "*" "--comment" "CoreOS Admin" "--groups" "adm,sudo,systemd-journal,wheel" "core" Stdout: "" Stderr: "useradd: cannot lock /etc/group; try again later. "

Hmm that looks like https://github.com/coreos/fedora-coreos-tracker/issues/1250

cgwalters commented 1 year ago

Use $ignition_firstboot instead of ignition.firstboot

cgwalters commented 1 year ago

For doing OS updates at coreos-installer time, the topic of kernel arguments came up (both Ignition and MachineConfig for OCP). Basically we need to extract the kernel arguments from the Ignition/MC and apply them at this time too.

ori-amizur commented 1 year ago

@cgwalters I implemented https://github.com/coreos/fedora-coreos-tracker/issues/1151#issuecomment-1430263412 and it works, except that when I moved to 4.14 image I got error from ostree:

error: Performing deployment: Importing: Layer sha256:dc1def201e9ad26de700cc96e92dad71ebfb917de1e8155f9affe2d0abd1fcd0: Importing object bd/862ae43b9c5f71e7513e91b71b419b2436a556faaa59204441094856dcc493.file: Processing content object bd862ae43b9c5f71e7513e91b71b419b2436a556faaa59204441094856dcc493: Setting xattrs: fsetxattr(security.selinux): Invalid argument

When I disabled selinux (setenfoce 0), the ostree command was successful. Any idea how to overcome the problem?

cgwalters commented 1 year ago

When I disabled selinux (setenfoce 0), the ostree command was successful.

Yes...it's possible but tricky to handle this. We do it in bootc today, see https://github.com/containers/bootc/blob/0a657ea9609e33c09e8bf2c013699c379f0ccc43/lib/src/lsm.rs#L32

But setenforce 0 is indeed the most reliable way. It's how Anaconda has always worked.

Smithx10 commented 1 year ago

@cgwalters,

I am hoping to build a pxe image with zfs added. My intention is to run the os from ram only and never install it onto disk.

I think it would be helpful to have an example that uses https://github.com/coreos/layering-examples/tree/main/build-zfs-module to demonstrate creating a bootable image that has zfs installed.

Has this issued progressed far enough to do this now? If so could you please show me how I would build one?

Thanks for all your effort solving this problem.