coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

Kubernetes v1.24+ container runtime on Fedora CoreOS #767

Open dghubble opened 3 years ago

dghubble commented 3 years ago

Kubernetes intends to drop drop support for docker-shim as a container runtime in v1.22. Currently, Fedora CoreOS 33.20210217.3.0 ships docker 19.03.13. docker-shim remains the most stable, tested, available out-of-the-box runtime, but this will end soon. I'd like some kind of clarity on Fedora CoreOS's intentions, such as shipping a compatible containerd or cri-o. With Kubernetes cutting v1.22 alpha releases (likely pretty soon) in the time frame of Fedora 34, as Kubernetes distros we'll want to start evaluating and conformance testing the selection.

Overall, I'd like to know there is some plan. Ideally a documented one. Flatcar Linux has published their intentions to ship containerd in time and already has a mechanism to test it (docs).

cri-o

It sounds like cri-o can only be installed by downloading an RPM (from where?) directly and rpm-ostree installing it (unverified). dnf and yumdownloader used by an openshift script aren't present. https://github.com/coreos/fedora-coreos-tracker/issues/292#issuecomment-796998069. I have some maturity concerns about that. Is this really the recommended path? For a container optimized distro to not have a better path to getting a container runtime?

Releases are also pinned to Kubernetes versions and seem to lag by quite some time, so I have some velocity concerns (the runtime needs to be fairly stable, we roll forward when Kubernetes does, think hours).

containerd

I haven't seen Fedora CoreOS plans to make this the runtime of choice.

Or is the plan something else?

dghubble commented 3 years ago

:eyes: cri-o doesn't list Fedora CoreOS as a supported OS and the only mention of how to hack it in is a year ago from @dustymabe (thx). And then cri-o creates a new need for conntrack on-host and forces you down the path of installing other RPMs which will need strict versioning.

https://github.com/cri-o/cri-o/blob/master/install.md https://discussion.fedoraproject.org/t/installing-using-cri-o-on-fedora-coreos/15961/5

lucab commented 3 years ago

[cri-o] Releases are also pinned to Kubernetes versions and seem to lag by quite some time

This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).

This strict interlock is an explicit design, highlighted at https://github.com/cri-o/cri-o. To that extent, for k8s distributions the whole "kubelet plus cri-o" is effectively a single "node runtime" component.

rhatdan commented 3 years ago

@mrunalp @haircommander PTAL

rhatdan commented 3 years ago

@mrunalp @haircommander Why isn't Fedora a supported platform for CRI-O? Is this just an oversite?

haircommander commented 3 years ago

fedora is listed https://github.com/cri-o/cri-o/blob/master/install.md#fedora-31-or-later ~it's just not in the table (which is both out of date and unneeded)~ wait a minute, it is also in the table. Where are you getting indication cri-o is not supported on fedora @dghubble? Are you looking for Fedora CoreOS specific installation instructions?

haircommander commented 3 years ago

[cri-o] Releases are also pinned to Kubernetes versions and seem to lag by quite some time

This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).

This strict interlock is an explicit design, highlighted at https://github.com/cri-o/cri-o. To that extent, for k8s distributions the whole "kubelet plus cri-o" is effectively a single "node runtime" component.

I agree with @lucab here, kubelet and cri-o should be installed together with matching versions. If there's a way we (the cri-o team) can minimize the friction with this in FCOS, I'd love to hear about it.

LorbusChris commented 3 years ago

As I see it, the issue here is more that it's a bit difficult to install the cri-o RPM because it's distributed as a module and rpm-ostree can't handle it. At least for OKD, making the version of cri-o that is used in the current OKD version available in a standard, non-modular yum repo would help since we could then include it easily (either in the base compose, or as an extension in a local yum repo) Longer term, rpm-ostree should probably be taught to work with modular repos.

haircommander commented 3 years ago

how is the kubelet packaged for OKD now? Could the appropriate cri-o version be tossed in that package?

vrutkovs commented 3 years ago

In OKD we grab kubelet RPM from origin artifacts, copy it to /tmp/rpms, extract to /tmp/working and create a new rpm-ostree commit with this dir overlayed.

Similar steps to fetch CRI-O RPM

dghubble commented 3 years ago

For comparison, docker+shim (and soon containerd on Flatcar Linux) provide a suitable container runtime that meets Kubernetes CRI minimum needs, out of the box. Kublet is bundled as a container image. Runs. Conformant. We roll forward within days of Kubernetes releases and do need to not wait on changes from the base OS or any package ecosystem.

If we move toward cri-o, in addition to the friction of adopting cri-o (it feels like we'd be the first user outside okd), there is this velocity concern. I appreciate you folks have explicit RPM packages for components. In pratice, we'd have to release new Kubernetes with Flatcar Linux only, and somehow have Fedora CoreOS come weeks later or something. We'd be waiting on this package ecosystem. How strict is the cri-o to k8s versioning really? Is it just about valiation?

haircommander commented 3 years ago

How strict is the cri-o to k8s versioning really? Is it just about valiation?

in practice, not terribly strict. It's the safest bet, though. the cri, while generally stable, does change. We don't backport new cri changes to older versions of cri-o. We also don't attempt to test any kubelet/cri-o skew. Basically: we make no support claims for anything other than matching versions.

Generally, folks don't have much trouble with having mismatched versions (I've never heard anyone complain about it). But it's theoretically possible

Kublet is bundled as a container image.

out of curiosity, do you run the kubelet inside a container, or do you use the image to package it easily?

LorbusChris commented 3 years ago

@vrutkovs @haircommander it looks to me as though we are running the kubelet through the hyperkube binary (see https://github.com/openshift/installer/blob/master/data/data/bootstrap/systemd/units/kubelet.service.template#L15-L16), which is extracted from the payload and written to disk during install (lives in https://github.com/openshift/kubernetes/blob/master/openshift-hack/images/hyperkube/Dockerfile.rhel). I suppose crio could be distributed the same way, that is not as an RPM, but as a container.

vrutkovs commented 3 years ago

that is not as an RPM, but as a container.

Yes, and we had this option in 3.x days. AFAIK some distros (Rancher) do that too - but node SIG doesn't officially support running kubelet (and container engine) in container, as it complicates volumes.

Perhaps we should be extracting binaries to /usr/local/bin?

dghubble commented 3 years ago

If we're just sideloading cri-o on our own with a "works on OKD with our RPMs" promise, as helpful as that is, it reduces the value prop for FCOS a bit. Could we at least have some "official" flow that guarantees cri-o can be installed successfully on FCOS? Ideally in FCOS that'd be an Ignition mechanism, but Dusty's script still seems to be the best approach that installs it currently. And some comittment to cri-o being a supported case on FCOS, beyond a question in the forums?

@haircommander Kubelet in a container with podman, like other on-host services (e.g. etcd). @LorbusChris hyperkube was deprecated upstream back in k8s v1.18, so I'm guessing that's something custom openshift is doing

With v1.22, container runtime experience might become a differentiator for users picking from base OSes. I want both to be strong choices and don't weigh in on my users choice.

LorbusChris commented 3 years ago

@dghubble https://github.com/coreos/rpm-ostree/issues/1435 would probably be the cleanest solution to this then

haircommander commented 3 years ago

I agree @LorbusChris , I think that's the best not-hacky way to guarantee cri-o can be installed (and kubelet can be installed with a corresponding version). we could even couple a kubernetes module with kubelet, cri-o and crictl

dghubble commented 3 years ago

This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).

How about Fedora CoreOS shipping containerd then, as the general container runtime? Which would give a window of compatibility @lucab

haircommander commented 3 years ago

This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).

How about Fedora CoreOS shipping containerd then, as the general container runtime? Which would give a window of compatibility @lucab

What reasons are there for shipping a general container runtime? If the modules thing is figured out, and cri-o can be installed with kubernetes seamlessly, I don't see a need for containerd

dghubble commented 3 years ago

Some of the original value behind Container Linux being packageless were shipping a minimal OS suitable for cluster uses cases (i.e. container runtime is "new enough"). Adding RPMs is more a step toward traditional approaches and slower cadence. What happens when Kubernetes has a release, but cri-o doesn't have an RPM yet if they're in lock-step? For example, how would we test Kubernetes v1.21-beta.1 right now? We'd need to release without Fedora CoreOS, and I can see users gravitating toward Flatcar/containerd if this just isn't even a factor there.

To be clear, I don't care which container runtime is chosen. I have no horse in this race (thank you CRI). Just that it work well in these cases going forward.

haircommander commented 3 years ago

Some of the original value behind Container Linux being packageless were shipping a minimal OS suitable for cluster uses cases (i.e. container runtime is "new enough"). Adding RPMs is more a step toward traditional approaches and slower cadence. What happens when Kubernetes has a release, but cri-o doesn't have an RPM yet if they're in lock-step? For example, how would we test Kubernetes v1.21-beta.1 right now? We'd need to release without Fedora CoreOS, and I can see users gravitating toward Flatcar/containerd if this just isn't even a factor there.

I see this as mostly a syncing of requirements. Up until now, we haven't had a request to test a yet-to-be-released cri-o in FCOS. I see no reason we couldn't ship the module early--assuming we can set expectation about stability before the .0 release. I'm happy to work together to setup cri-o packaging better on FCOS

vrutkovs commented 3 years ago

CRI-O is also being built upstream in non-module RPMs, rpm-ostree can install those - see crio 1.20

jlebon commented 3 years ago

If we're just sideloading cri-o on our own with a "works on OKD with our RPMs" promise, as helpful as that is, it reduces the value prop for FCOS a bit. Could we at least have some "official" flow that guarantees cri-o can be installed successfully on FCOS? Ideally in FCOS that'd be an Ignition mechanism, but Dusty's script still seems to be the best approach that installs it currently.

Yes, this is #681. We definitely need to improve the UX on package layering. (Modularity is something that this sugar will need to consider as well.)

And some comittment to cri-o being a supported case on FCOS, beyond a question in the forums?

Agreed we need to discuss this (IMO, yes this should be supported). Essentially, I think we need to:

I don't think rpm-ostree natively supporting modules is a blocker for all this, but it definitely would make it easier.

haircommander commented 3 years ago

If we're just sideloading cri-o on our own with a "works on OKD with our RPMs" promise, as helpful as that is, it reduces the value prop for FCOS a bit. Could we at least have some "official" flow that guarantees cri-o can be installed successfully on FCOS? Ideally in FCOS that'd be an Ignition mechanism, but Dusty's script still seems to be the best approach that installs it currently.

Yes, this is #681. We definitely need to improve the UX on package layering. (Modularity is something that this sugar will need to consider as well.)

And some comittment to cri-o being a supported case on FCOS, beyond a question in the forums?

Agreed we need to discuss this (IMO, yes this should be supported). Essentially, I think we need to:

* decide on a version (or range of versions) of k8s we want to support

I think supporting the three releases k8s upstream supports should work

* work with the cri-o team to have the matching RPMs for those versions available (either as proper modules in Fedora, or e.g. using [the new OS extensions work](https://github.com/coreos/rpm-ostree/pull/2439) to ship them in a side yum repo)

I am happy to go forward with either, though my preference is proper module support (it feels more idiomatic for fedora)

* add CI tests that sanity-checks the supported cri-o versions (at least with Prow, we can do full e2e testing for the version currently targeted by OKD)

big +1

make sure that the install layer (OKD/Typhoon) knows how to drive rpm-ostree to install the right cri-o version for the target k8s version being installed (e.g. enable a yum repo, then call rpm-ostree install cri-o-1.20)

I suppose if we go the extensions route, we could use the existing OBS infrastructure for this (enable the OBS repo instead of some special cri-o one)

jlebon commented 3 years ago

Related: https://github.com/openshift/os/issues/498

bgilbert commented 3 years ago

What reasons are there for shipping a general container runtime? If the modules thing is figured out, and cri-o can be installed with kubernetes seamlessly, I don't see a need for containerd

Fedora CoreOS is meant to be a general platform for running containers. If there's a container runtime that people want to use, and there are no technical blockers to shipping it, we should probably ship it. We already ship Moby and podman, and would already be shipping cri-o if not for the version-skew issue.

Is there some reason we shouldn't ship containerd?

lucab commented 3 years ago

By the way there is already a containerd package shipped in FCOS, which is getting in as a dependency. We don't cover that explicitly in our CI, so I don't know whether it actually works.

rhatdan commented 3 years ago

We need to look to shrink the size of FCOS not increase its size.

bgilbert commented 3 years ago

@rhatdan Sure; see also #186. In general, we try to be careful about adding packages to the distro. Historically, though, when a package is important to the core functionality of FCOS (supporting hardware and running containers) and can't be run in a container itself, we've tended to add it to the distro. I suspect the biggest reductions in distro size would come from trimming dependencies from packages in our core set.

@dghubble From your perspective, what is FCOS missing in order to support containerd as a viable option? Does the package mentioned in https://github.com/coreos/fedora-coreos-tracker/issues/767#issuecomment-802674793 meet your needs?

rhatdan commented 3 years ago

I want to replace all moby-engine/containerd with podman-docker, and then treat moby-engine/containerd the same way we treat kublet/cri-o as layers on top of FCOS.

rhatdan commented 3 years ago

du -sm /usr/bin/docker /usr/bin/dockerd /usr/bin/runc /usr/bin/containerd 69 /usr/bin/docker 112 /usr/bin/dockerd 19 /usr/bin/runc 55 /usr/bin/containerd

rhatdan commented 3 years ago

du -s /usr/bin/docker /usr/bin/dockerd /usr/bin/runc /usr/bin/containerd 69920 /usr/bin/docker 113696 /usr/bin/dockerd 19276 /usr/bin/runc 55772 /usr/bin/containerd

versus

du -s /usr/bin/podman /usr/bin/crun /usr/bin/conmon 44580 /usr/bin/podman 356 /usr/bin/crun 132 /usr/bin/conmon

dghubble commented 3 years ago

I appreciate the efforts to make the container tools you work on smaller. Though the size of the binary has not been a decider for me in choosing a container runtime.

Providing container runtime(s) is a core part of choosing Fedora CoreOS and we need ready options to replace dockershim. If either containerd or cri-o or both were shipped that'd be nice. Both would keep the projects honest (why not compete head to head) and give options that are definitely in FCOS's wheelhouse. I could see maintenance pushback being legitimate though.

@bgilbert I noticed containerd shipped with docker in December and tried it then. There were some difficulties getting the right /etc/containerd/config.toml (the default is junk values) for CNI to be happy and I stopped looking at it since it didn't seem officially supported anyway. I could revisit. https://github.com/poseidon/typhoon/pull/959

EDIT: It kinda works minimally. But it needs additional CNI plugins it didn't before (firewall, tuning, etc). Sideloading those by hand (which could later be done by the flannel-cni daemonset) and pods can start at least. It would be helpful to have crictl.

bgilbert commented 3 years ago

@rhatdan The need to layer in cri-o and kubelet is something we've accepted out of necessity, but I don't think we should embrace that pattern when it's not necessary. Ideally FCOS would fully support all container workflows out of the box, without requiring the user to assemble any additional parts. And we do have users who want to run Docker.

@dghubble That'd be great. Any guidance you can provide would be helpful.

This topic is on the agenda for possible discussion at tomorrow's community video meeting with FCOS and Podman developers. For details, see https://github.com/coreos/fedora-coreos-tracker/issues/768.

rhatdan commented 3 years ago

The problem with supporting all of the container work flows is size. Eliminating moby-engine and friends eliminates almost 20% of the size of FCOS. Podman can provide everthing you need to do for Docker.

If you want to run Kubernetes on FCOS you are going to need more then just containerd. You will need crictl, kublet, plus a few other tools, these are not shipped by default in FCOS and the Kublet tends to be tied to specific versions of the container engine.

dghubble commented 3 years ago

@rhatdan In this issue, I'm interested in what Kubernetes compatible container runtimes will be supported on FCOS. I'm already using and happy with podman as the container runner on-host for systemd units, etc. I don't think we have to rehash that topic (which I view as separate) or make it about podman vs docker. I'm already sold on podman.

You can run conformant Kubernetes on FCOS today, without layerying in any RPM packages. Kubelet was covered here. Its an Openshift specific design choice to use RPMs (which is fine, I don't want to make this about how y'all build your product). Just to emphasize that this pattern is a choice, not a neccessity. Today the base OS provides us the container runtime for Kubernetes, currently docker(shim), and in future some suitable replacement.

cgwalters commented 3 years ago

Via injecting an Ignition config today one can do:

$ sed -i s,enabled=,enabled=1, /etc/yum.repos.d/fedora-modular.repo
$ rpm-ostree install crio

That will pull the latest crio and install it live; indeed rpm-ostree is unaware of modules though and will just pick the latest.

Soon you'll be able to add --apply-live there and avoid the reboot (if applicable).

jlebon commented 3 years ago

We could ask rpm-ostree to pull a specific version but that gets messed up by release field (e.g. rpm-ostree install cri-o-1.19 won't work). One pretty easy hack around (and I swear I'm not trying to get out of rpm-ostree supporting modularity properly, but it could be a viable short-term solution) is to have the cri-o modules have a symbolic Provides that's just Provides: cri-o-%{version} = %{release} and then we can just do rpm-ostree install cri-o-1.19 and make sure the host stays on the 1.19 stream.

wernerb commented 3 years ago

I appreciate the efforts to make the container tools you work on smaller. Though the size of the binary has not been a decider for me in choosing a container runtime.

Providing container runtime(s) is a core part of choosing Fedora CoreOS and we need ready options to replace dockershim. If either containerd or cri-o or both were shipped that'd be nice. Both would keep the projects honest (why not compete head to head) and give options that are definitely in FCOS's wheelhouse. I could see maintenance pushback being legitimate though.

@bgilbert I noticed containerd shipped with docker in December and tried it then. There were some difficulties getting the right /etc/containerd/config.toml (the default is junk values) for CNI to be happy and I stopped looking at it since it didn't seem officially supported anyway. I could revisit. poseidon/typhoon#959

EDIT: It kinda works minimally. But it needs additional CNI plugins it didn't before (firewall, tuning, etc). Sideloading those by hand (which could later be done by the flannel-cni daemonset) and pods can start at least.

The CNI plugins are shipped in fcos but most CNI implementations require writing new binaries and reading plugins from /opt/cni/bin. They are located in /usr/libexec/cni which are readonly. I solved this by symlinking each cni plugin binary with:

ExecStartPre=-/bin/sh -c "for f in /usr/libexec/cni/*; do ln -s \"$f\" /opt/cni/bin/$(basename $f); done"

It would be helpful to have crictl.

crictl is crucial for debugging workloads previously able to use docker.

dghubble commented 3 years ago

@wernerb I don't see CNI plugins as being on Fedora CoreOS's plate. CNI plugins are often placed on hosts by DaemonSets. Not a worry. @bgilbert I'll post back with more details, probably not during a work week @cgwalters thanks, will see if we can get this into a systemd unit and try it out

Conan-Kudo commented 3 years ago

We could ask rpm-ostree to pull a specific version but that gets messed up by release field (e.g. rpm-ostree install cri-o-1.19 won't work)

To note, DNF does support dnf install 'foo = version' as a way to request package installation. Does rpm-ostree not have that capability?

jlebon commented 3 years ago

We could ask rpm-ostree to pull a specific version but that gets messed up by release field (e.g. rpm-ostree install cri-o-1.19 won't work)

To note, DNF does support dnf install 'foo = version' as a way to request package installation. Does rpm-ostree not have that capability?

Sorry, I mixed up terminology in that comment. The issue isn't the release field, it's the patch component of the version string. E.g. you can't ask for cri-o-1.20. You have to pin to a specific version but here we'd want to pin at the minor level.

Anyway, I've been working on teaching modules to rpm-ostree so hopefully soon we can do this properly without any hacks. (Edit: see https://github.com/coreos/rpm-ostree/pull/2760#issuecomment-825855951.)

buckaroogeek commented 3 years ago

In my home lab I use fedora 33 to host a small kubernetes cluster with cri-o. I use dnf modularity for cri-o version management and the dnf versionlock plugin to provide version management of packages from the upstream kubernetes repository. dnf versionlock add --raw kube???-1.20.? will allow dnf to install the current version kubeadm, kubectl, and kubelet. It also will allow updates to any patch version of the 1.20 release. Would it be feasible to add a similar capability to rpm-ostree? (update - added the --raw tag which is essential for the versionlock behavior needed)

dghubble commented 3 years ago

I've been running clusters using containerd (available by default) to replace the docker-shim lately. With the broad use, forward compatability, and already being installed, this is the likely direction for the underlying container runtime in Typhoon.

OS-IMAGE                          KERNEL-VERSION             CONTAINER-RUNTIME
Fedora CoreOS 34.20210529.2.0     5.12.7-300.fc34.x86_64     containerd://1.5.0
Fedora CoreOS 34.20210529.2.0     5.12.7-300.fc34.x86_64     containerd://1.5.0
Fedora CoreOS 33.20210413.dev.0   5.10.19-200.fc33.aarch64   containerd://1.4.4

cc @bgilbert

anthr76 commented 3 years ago

For users wishing to use cri-o this leaves us in a sad place :(

Basically one of the most appealing options is Kubic though using it in a automated fashion with tools like matchbox and typoon is incredibly hard with AutoYasT. It's a great opportunity for FCOS to pickup cri-o and have it as a default though I understand the technical constraints..

jlebon commented 3 years ago

Update on this: proper support for modularity has now merged in rpm-ostree (https://github.com/coreos/rpm-ostree/pull/2760). So in the next release, one should be able to do e.g.

$ rpm-ostree ex module install cri-o:1.20

For now, it'd work to do this in a systemd unit and reboot like in https://docs.fedoraproject.org/en-US/fedora-coreos/os-extensions/. But eventually, we still want to polish the UX for extensions as in https://github.com/coreos/fedora-coreos-tracker/issues/681.

Related patch to stop disabling modular repos in FCOS at: https://github.com/coreos/fedora-coreos-config/pull/1149.

jlebon commented 3 years ago

So circling back to earlier discussions here, leveraging modularity I think at this point we should be able to form a stance on which supported cri-o runtime versions are supported in collaboration with the containers team.

@haircommander, you mentioned in https://github.com/coreos/fedora-coreos-tracker/issues/767#issuecomment-799700202 we should just support all the versions supported upstream, which makes sense to me. Should we also have a stream for the next development version? (E.g. right now cri-o:1.22.)

Then the next step would be adding CI for testing the supported cri-o versions.

anthr76 commented 3 years ago

Would it make sense for Fedora to adopt a cri-o:stable module for a somewhat "rolling" cadence of cri-o? It would follow the latest CRI-o release with kubernetes. This would of course create somewhat of a drift between the CRI and the user updating their kubelet. Though it just leaves updating the kubernetes components to the user instead of the CRI as well. Which can come much earlier then kubernetes components and may not be easy to change.

haircommander commented 3 years ago

In my ideal universe, we would package all the k8s binaries, cri-o and crictl as part of a kubernetes package, all of which would be modular per-release

(another step we need is reintroducing cri-o package to fedora--there was a mistake and it was orphaned: https://bugzilla.redhat.com/show_bug.cgi?id=1970050)

jlebon commented 3 years ago

@haircommander, I noticed that f35 only has 1.19, while f34 has 1.20. And f36 only has 1.21. Is this intended or does the module just need some loving?

Conan-Kudo commented 3 years ago

Some effort needs to be put in to bring everything back in sync.