Equivalent to system containers from Fedora Atomic in Fedora CoreOS

strigazi commented 6 years ago

This issue aims to collect use cases for system containers.

One of the ways to customize the Fedora Atomic Host is to use system containers which is does't require a reboot and can run arbitrary applications if they are built in an OCI image, not only from rpms.

Here is my input, as a community member of the OpenStack/Magnum project and as an operator of the CERN cloud.

In the OpenStack/Magnum project (present in 10-15% of openstack deployments per user survey) we are using syscontainers to run:

kubernetes, etcd, flannel (to become self-hosted at almost everyone else does)
docker-ce
an openstack-specific daemon (os-collect-config) which is actively maintained in rdo (centos) not in fedora
a cern-specific volume plugin

Apart from the kubelet, docker-ce and the openstack daemon which is used for passing configuration and runs first, the other components can be containerized in other ways, pods, docker or podman containers.

As a user I would like to see a similar solution in Fedora CoreOS that gives freedom to run newer (or from master) releases of kubernetes/cri-o/docker/containerd/kata-containers using a stable minimal base OS.

If you are another user with similar needs, please provide your input here!

strigazi commented 6 years ago

I don't know if I have rights to add the meeting label.

miabbott commented 6 years ago

In the 2018-08-29 community meeting, we discussed this issue a bit and came away with some additional questions and comments. Ultimately, we don't have a decision made about this yet, but want to continue to gather feedback and discuss possibilities.

What technology do we want to use for extending the host for things we support (i.e. different versions of docker or kubernetes) Examples include: system containers, systemd portable services, package layering, etc.
Can we give users a valid path to use system containers even if we don't ship the atomic CLI on FCOS? Are users comfortable installing/managing system containers without a well known entrypoint?
Do we know if system containers are part of the Red Hat plan going forward?

@strigazi We had a specific question about how you are currently using system containers (which ties into the first point above).

How does the OpenStack/Magnum project install/manage the system containers that are used?

glevand commented 6 years ago

Just to mention it, CoreOS Container Linux runs things like etcd using rkt's fly stage1, which essentially runs a container image as a chroot. CL provides a default systemd service that manages this. See the coreos-overlay etcd-wrapper for details.

strigazi commented 6 years ago

@miabbott We are using the atomic cli. The benefits that we see are that:

an app can live in its own repo with the config.json, dockerfile or buildah script, tmpfs files
the atomic cli knows what to based on what finds in /exports/.
containers are nicely packaged in their image and they are delivered using the registry v2 api

The last one is a big plus, users in isolated envs can still use magnum with minor changes (just point to another registry and use the same images) VS setting up an server for serving rpm-ostree repos. For example, at CERN we use the gitlab registry, another magnum user Catalyst Cloud which is a public cloud provider in NZ uses a completely different product (don't remember which one) and the only thing they need to do is mirror the same containers in the registry that they use for other products/teams and so on.

My comments for the first two items. I would favor a similar solution to the atomic cli, which is written in python and python might not end up in FCOS. podman containers wrapped in a systemd unit like {etcd,kubelet}-wrapper? I would like to see how a systemd portable service would look like. Package layering sounds nice but it is not as intuitive as syscontainers.

syscontainers and systemd portable services like *-wrapper give quite some freedom and the workflow to manage, develop and build is the same (or close to) any other container. Package layering is not as user friendly.
I'm not sure about the option without a well known entrypoint I need more data :)

glevand commented 6 years ago

After looking into systemd portable services, it seems to be what we want. I think the main concerns with it stem from the fact that it is a fairly new and untried feature. A plus for users is that it would/could be the same method and tools used on other distros.

We need to think about which may be better; investing effort to create a FCOS specific container run-time, or investing effort to stabilize and mature systemd portable services.

redbaron commented 6 years ago

If you manage to integrate systemd portable services + Ignition + ostree , that would be a killer feature. Ignition authors would reference ostree repo + commit id and Ignition is doing all the magic to fetch it and prepare portable service image (that is if native ostree support wasn't added to systemd by that time).

ajeddeloh commented 6 years ago

If you manage to integrate systemd portable services + Ignition + ostree , that would be a killer feature.

I agree, that'd be pretty slick.

There's a whole other discussion to be had about whether we want to keep the Ignition/ct split (lets not discuss that here), but if we did that'd be something that would fit nicely with ct. Perhaps portable services themselves would work well as a first class Ignition feature (since they are only dependent on systemd) but I'm more hesitant to also start requiring ostree. That could be something where the fcos ct handles writing out a unit to set up the ostree and writes the Ignition section for the systemd portable service.

I do wonder about how you would update the tree being used. I think it'd be even slicker to be able to track a branch in some manner so you can get updates. But since Ignition only runs at first boot it can't update existing systems. Maybe a unit that runs on boot and updates the tree? Maybe a daemon or systemd timer that checks more frequently? This is a whole 'nother can of worms though.

(that is if native ostree support wasn't added to systemd by that time)

Do you know if the systemd folks have plans for that? It'd be pretty cool.

dustymabe commented 6 years ago

discussed in the meeting today.. Here are some questions we had during that discussion:

podman

- could we use podman
    - luca:  It's mostly a matter of patterns in writing systemd service units running podman

systemd portable services

- are portable services stable enough for us to start using them (will they be by F30?)
- should we investigate running docker, kubelet, etc from a PS?
- can we deliver PS via an OCI image registry
- is there a story for "updating" a PS
- are portable services good enough (luca points out they won't solve all of our problems)

dustymabe commented 6 years ago

@strigazi have you had a chance to look at portable services since last week. should we untag from meeting for this week?

dustymabe commented 6 years ago

removing the meeting label for now

strigazi commented 6 years ago

I thought I replied, sorry, I did some progress but I won't be able to join today. Let's do it next week.

Thanks Dusty

dustymabe commented 6 years ago

FYI video from talk about portable services from All Systems Go conference last week:

https://media.ccc.de/v/ASG2018-200-portable_services_are_ready_to_use Spotlight on portable services scope and implementation details

lucab commented 6 years ago

As a followup to the ASG talk above we did a brief out-of-band brainstorming in order to check that we can have a proper user flow for first-boot provisioning portable services via Ignition. The summary is that Ignition may just need to fetch images and then use portablectl in offline mode to explode the artifacts, so that everything is in place by the time we pivot. There are a few features missing currently in portablectl for that, recorded in GH tracker. Systemd folks agreed in pushing those forward (but no ETA as of now).

jasonbrooks commented 5 years ago

I spent some time experimenting w/ systemd portable services last week. I think they're not ready for prime time quite yet. One big item is selinux support -- I could only use them w/ selinux in permissive.

I tested with the kubelet, and wasn't able to get it working -- I got it to run, but it didn't seem to be communicating with docker on my test system. Just as with the system containers for kubernetes, there are quite a few bind mounts to figure out to get things working, and the configuration options for those, at this point, are fewer than we had for system containers.

One really nice thing about portable services is that they come w/ systemd for free, so we don't need a separate python based client, and many different hosts will support them "out of the box."

Right now, the best replacement for system containers, IMO, is rpm-ostree package layering.

smekkley commented 5 years ago

@jasonbrooks Doesn't it fit into one of the anti goals specified in PRD.txt?

'''Host for running non-containerized applications'''

I wonder if rpm-ostree can provide the flexibility that rkt provides with systemd unit, such as running preferred version of etcd, kubelet, bootkube and whatever services the users need for their system that's not provided by the packages.

keithy commented 5 years ago

Use Case: for system 'super-priviledged' container - System Health Check
First Cut: https://github.com/keithy/goss-ps/tree/fedora31
download https://github.com/keithy/portable_goss/archive/goss_v0.2-fedora31-x86_64.tar.gz
known issues
- goss knows nothing about rpm-ostree packages
- setenforce Permissive #312
- bash access to the 'container aka portable service' could be ssh (more work needed)
- Since an os update will require a reboot, an on-boot service to download the latest versions, and re-attach newer ones ought to be trivial to write.

(tested on Fedora31... watch this space)

I am really liking these portable-services things, and feel that they deserve a lot more attention. They are so much simpler than the whole container rigmarole. In this example the isolation layer is about as thinned down as I have been able to muster; I felt I had it cracked when I managed to 'exec' into my "container" using bash via nc. (no /proc tricks needed)

Killer feature... these are ultra light weight and don't need binaries to be statically linked. Although in this case goss is statically linked, the included example goss-walk (Lennart Poettering's walkthroughd is included for testing purposes)

Although it prefers to be 'attached' with a 'trusted' profile a surprising amount of things work with the 'default' profile spported by some read only bind mounts. I see no harm in deploying as 'trusted', particularly since the core functionality of goss is read-only.

Suggestions and feedback would be appreciated. p.s. next target for this process will be cockpit.

smekkley commented 5 years ago

I'm curious how systemd-portable-services will replace hyperkube, etcd containers and components strigazi has mentioned, and whether it's possible to reuse the docker image in the official repo and etc. If docker image can't be directly reused, FCOS needs to set up a repo and actively update images. I'm sure we can make hyperkube and etcd work in systemd-nspawn but the images have to be created from scratch. Maybe I'm missing the point of portable service completely.

keithy commented 5 years ago

At the moment I'm just experimenting to see what is possible.

Portable services does need some additional supporting infrastructure, and some install/uninstall start/stop conventions. Similar to the way that 'atomic' added it's install scripts into the docker metadata.

As a starting point, the health-check seemed like a good idea as it directly compliments ignition. [so much so that in these enlightened CI/CD times it doesnt really deserve a container, it ought to be built-in (and thanks to ignition it almost can be), however 'goss' itself ought to be better maintained if it is to be a candidate for inclusion in a mainstream OS]

keithy commented 5 years ago

Some terminology clarity could be useful. I dont think that "Portable Services" is the best name but I guess we are stuck with it. This is what I have learned and is offered by way of discussion.

Definitions: A "Portable Service" consists of three concepts:

Firstly, executables run in a chrooted environment setUp/tornDown by systemd - that's the "Service" part.
- Any service (including non-"portable" ones) can request any root file-system, any level of security or any of many available isolation tricks for its activities.
- A "service" can be Type="one shot" so this is not limited to services but can apply to any executable.
Secondly, an enforcement paradigm applies to Services that are not natively installed - "portable" merely indicates non-native, or "other".
- being "other" they are run in a mandatory chroot environment
- they are not expected to be malicious, but potentially trustable.
- they define their own "wants" (i.e. a Temp directory)
- being "other" the OS dictates how their their "needs" will be met (at installation time)
  - the installer can choose the level of imposition from one of four profiles provided "default", "trusted", "strict" "nonetwork"."portablectl".
  - A fifth profile "none" works, but if a tree falls in a chrooted environment with no way out does it make a sound? You can't do anything, except perhaps fill up your own allotted disk space.
Thirdly, an installation process connects, or to use the terminology, "attach"es"a separate filesystem to the host OS.
- The filesystem may be a disk image or a directory tree.
- Any services defined in that "container" are merged into the host OS, such that the host OS provides the "init" system, and process management from the outside for the processes invoked within the "container".

This appears to be a very elegant solution, because in theory the "container" (small-c) can be anything from a complete bootable OS, i.e. a full on "Container" (big-C), that has its own init system (but doesn't use it while it is being run as a Portable Container) all the way down to a single executable, that depends on the shared libraries of the host OS. I consider this a somewhat "porous container".

So right now I am actually wondering what is the point of docker/podman/runc etc, did Lennart really ace this thing? So as you can see I started from the bottom up, with some code running in a chroot environment with a "none" profile, and found that "yes I am really boxed in", and at the other end of the scale for my health-check I ask for everything that I could possibly "want", and I am given my "needs" according to how the OS is directed to by its owner. The "default" profile works pretty well as it stands, but promises to keep me out of trouble. For me just the way user IDs are managed is pretty compelling!

So portable services are not strictly "services", and they are not necessarily "portable" in the ubiquitous docker sense, but they do seem to offer a lot of the advantages that we have come to expect from containers that help in the purpose of assembling a composable system, and this does appeal to me. (I like composable systems http://github.com/keithy/groan )

As a technical note, the chroot enforcement is applied by the command "portablectl" at installation time, there is nothing stopping you from taking something that is packaged up and arrived on to the system as a portable service, and manually installing it into systemd so as to bypass the enforced chroot. You/we/someone could write an "illDoItMyWayCtl attach", so on that basis, once we have filled in the missing pieces, metadata, package management, repository and transport... the installation method is still under your control.

I am thinking that this could evolve in the long run from 'rpm' to a 'cpm' a "Composable Package Manager", and there are bound to be interesting solutions for the missing pieces already out there. Nix/cargo/darch.

Another piece of the jigsaw could be a basic script/binary that can: a) lazily download a portable service / portable container b) and execute any binary which resides inside a chrooted environment,

(a doodle)
P ackage
O ther
R untimes
T o
A ssemble
B y
L azy
E xecution

keithy commented 5 years ago

Made some progress #311

coreos / fedora-coreos-tracker

Equivalent to system containers from Fedora Atomic in Fedora CoreOS #37

podman

systemd portable services