jlebon commented 2 years ago

In this ticket, let's come up with the various use cases that CoreOS layering enables for Fedora CoreOS. This will allow us to have more targeted discussions about each of them and evaluate (1) whether they're worth pursuing, (2) what the UX would look like, and (3) how it would be implemented.

Proposed documentation for use cases/explanation:

CoreOS Layering Use Cases

The techology we are referring to a "CoreOS Layering" allows users/projects to easily build derivative works that build on top of Fedora CoreOS. The build experience re-uses the container build workflow (i.e. Dockerfile or Containerfile), which is pervasive across the industry. The output of this build is a container image that can be hosted in a registry. Existing systems can be rebased to this container image and follow updates from there.

This is a significant change to the tooling around building new OSTree commits based on CoreOS. This is a net new set of features/tooling. It doesn't change any current tooling users may use and does not require users to make any changes to their current setup.

Here are some current use cases that are made possible or easier with CoreOS Layering:

As an End User

Currently Fedora CoreOS allows users to modify the system either through Ignition, by writing to any read/write directories, or via RPM package layering on the client side. These two approaches have served us well, but some use cases can be improved.

First, delivering configuration/files/software via Ignition works well, but can get heavy and also requires the user to re-provision if the configuration ever changes; Ignition is only supported for the first boot of a machine. Secondly, client side package layering can take a lot of time to run on first boot, makes upgrades less reliable, and isn't tightly controlled (software in package repositories change often).

It may be more desirable to deliver all of the changes as a derivative layer built as a container image, and delivered on first boot. We think this will be attractive in the following uses:

Configuration
- CoreOS Layering allows for configuration changes to be delivered as an update
  - With Ignition you must re-deploy the machine or build a bespoke method for delivering configuration updates (i.e. some tool using SSH).

Detailed Scenario: A user does a bare metal install of 10 systems in a datacenter. The user later discovers they should have deployed the systems with bonded networking setup With CoreOS Layering this change can be made to the Dockerfile definition, rebuilt, and delivered as an update. Without CoreOS Layering the recommended way would be to re-install/re-provision the machines; which would represent a significant waste of time for this user.

Unpackaged Software
- CoreOS Layering allows software to be built and layered in one operation
  - Re-using container build technology we're able to do multi-stage builds
    - This allows us to detect and match target software versions at build time
- CoreOS Layering allows more invasive operations than client side package layering
  - Users are allowed to write files to directories that are read-only client side
    - i.e. can write binaries into /usr/bin/ vs. /usr/local/bin
  - coreos-layering allows the software to be built and layered in one operation (multi-stage build)

Detailed Scenario: One example here that is compelling would be the case of third party kernel modules. A user can do a new multi-stage container build based on a recently delivered Fedora CoreOS base container. This multi-stage build can detect the delivered kernel in the base container, build the kernel module from source, and copy the results into the target container image. This committed container can now be pushed to a registry and and clients can target that image for updates.

Packaged Software
- CoreOS Layering allows the package layering to happen server side
  - With client side package layering
    - Every client has to do it separately
    - Each client now has to pull metadata from package repositories
      - This is heavyweight and happens at runtime
      - Changes in the package repo might cause upgrades to fail
  - With CoreOS Layering
    - No risk of package repo issues client side
    - Derivative commit can be tested before being delivered to clients

Detailed Scenario: To illustrate this use case further we can walk through a client side package layer scenario. Machine A and machine B exist and are following the stable stream. Both have the NetworkManager-wifi package layered. A new stable update released on Monday. Machine A updates on Monday, early in the rollout window; the client pulls in the new update and pulls NetworkManager-wifi from the package repo, making a new client side commit. On Tuesday Machine B attempts to update. At this point a few things could go wrong:

The package repo is unavailable. In this case client side package layering operation will fail. Machine B stays on the old commit and keeps retrying.
The package layering is successful, but pulls in a different version of NetworkManager-wifi than Machine A.

In both of these cases Machine A and Machine B, which are expected to be more or less running the same exact software, have diverged. Further, in scenario 1. if the package repository never comes back (i.e. using a repo from a third party) that machine is stuck forever.

As a Layered Project

Fedora CoreOS provides a nice stable base for other projects to build on top of, however every decision Fedora CoreOS makes isn't always right for layered projects. Currently the layered project will need to either decide to encode every change into an Ignition configuration that runs on boot of every instance or rebuild a brand new OSTree completely from scratch.

CoreOS Layering offers the opportunity for layered projects to easily make tweaks to Fedora CoreOS. Some examples of layered projects as of July 2022 that take different approaches:

Podman machine
- Uses Fedora CoreOS with a heavy Ignition config to customize instances on boot
OKD
- Rebuilds Fedora CoreOS from scratch with additions for OKD

With CoreOS Layering these projects can provide a more polished solution for end users:

Less changes get applied client side
- Decreased opportunity for provisioning issues
The commit that was built server side can get tested in CI
- Validation of the built (derived) commit can happen in CI

Potential Drawbacks of CoreOS Layering

The CoreOS Layering technology is still under active development. There are currently some workflows that haven't been fully fleshed out. Here is a summary:

Build tooling/infra for CoreOS Layering containers
- Any user creating derivative containers of Fedora CoreOS will need to continue to monitor and build new derivative containers when new Fedora CoreOS updates are released. These will need to be hosted in a registry that their machines can then pull container images from.
Updates via Zincati (update barriers; update graphs)
- Zincati/Cincinnati offer us a "safe" path to traverse when deploying updates to systems. When following a container image in a registry the user is following whatever is latest. Work still needs to be done to get back the added value from Zincati, into the CoreOS Layering workflow.

dustymabe commented 2 years ago

@jlebon and I got together to try to flesh out the CoreOS Layering use cases.

I have updated the description of this issue (2022-07-20) with some proposed use cases where the value of CoreOS layering is illustrated. I have also described current limitations of CoreOS Layering that we'll be working to address in the coming months. Please take a look and let us know how this could be improved and if any corrections need to be made.

cgwalters commented 2 years ago

Looked through this; seems sane. Thanks so much for writing this up!

The techology we are referring to a "CoreOS Layering"

That said, https://fedoraproject.org/wiki/Changes/OstreeNativeContainer currently proposes "ostree native container". About 90% of all the stuff written here applies outside of Fedora CoreOS too. Something to keep in mind.

cgwalters commented 2 years ago

Edit (jlebon): comment moved to https://github.com/coreos/fedora-coreos-tracker/issues/1263#issuecomment-1191702805

But regarding zincati specifically:

half-baked strawman: embed barriers in the container image

We encode "epochs"/barriers in like this:

quay.io/coreos-assembler/fcos:stable-v0
quay.io/coreos-assembler/fcos:stable-v1

We embed metadata in the container images that says that quay.io/coreos-assembler/fcos:stable-v1 is the successor to quay.io/coreos-assembler/fcos:stable-v0.

This is related to https://github.com/ostreedev/ostree/pull/874

cgwalters commented 1 year ago

I think chunks of this are now covered by the existence of https://github.com/coreos/layering-examples right?

If it's about trying to just explain the benefits and drawbacks, we could just move that into https://github.com/coreos/fedora-coreos-docs/pull/540 ?

cgwalters commented 1 year ago

I think what this issue is trying to get at really is the baseline question of when should users:

Take a pre-built "golden image" and just configure it (workstation and FCOS) today
Derive from and own OS updates (layering)

I think for 90% of this there's really nothing FCOS specific about this in the end (this is why the term "coreos layering" is misleading as a technology descriptor). IOW all of these tradeoffs are things that also apply to other rpm-ostree based systems. Particularly relevant to, but not limited to desktop ones.

So my instinct here is to:

take these concerns and just add it upstream to the rpm-ostree docs
Link from https://github.com/coreos/fedora-coreos-docs/pull/540 to the rpm-ostree docs around this
Close this issue

?

bgilbert commented 1 year ago

Fedora CoreOS docs describe lots of things that technically repeat documentation elsewhere. We should aim to be maximally helpful to new users, rather than asking them to assemble the opinions of various upstreams.

But also, FCOS is opinionated in various ways, and "when should you use Ignition vs. layering" seems like an important thing to have an opinion about. It affects not only the advice we give users, but our priorities for the functionality we build, and indeed how we think about the distro as a whole.

cgwalters commented 1 year ago

Yes, I have some! I added some bits from this in https://github.com/coreos/fedora-coreos-docs/pull/540#issuecomment-1534694065

But I'd hope you have specific opinions on this too that could be expressed here in the doc directly - would love to make this feel more collaborative.

cgwalters commented 1 year ago

I made this point in an OpenShift meeting, but I wanted to write it down here; going back one level, when I was saying this isn't FCOS specific in that it also applies to other rpm-ostree systems - in fact even if rpm-ostree (and coreos etc.) didn't exist, this problem also exists today in RHEL.

The day that RHEL introduced Image Builder, suddenly there are two ways to set up that Postgresql server in Azure (start from stock cloud image, maybe use cloud-init/ansible/whatever and yum install postgresq) or build golden disk images with IB and do updates by instance teardown/spinup. These things have the same very fundamental tradeoffs around systems management that we're introducing here.

I think indeed, it is on us to provide guidance. But I don't think there's any real way to not support these two paths ("configure" vs "build/own").

Now, what I hope actually is this fundamental change in mindset and technology eventually leads us to a place where we have a more "seamless" spectrum into what is e.g. today Fedora Cloud and Fedora Server, instead of having harder barriers.

coreos / fedora-coreos-tracker

Develop Fedora CoreOS layering user stories #1219

Proposed documentation for use cases/explanation:

CoreOS Layering Use Cases

As an End User

As a Layered Project

Potential Drawbacks of CoreOS Layering

half-baked strawman: embed barriers in the container image