Open dghubble opened 3 years ago
:eyes: cri-o doesn't list Fedora CoreOS as a supported OS and the only mention of how to hack it in is a year ago from @dustymabe (thx). And then cri-o creates a new need for conntrack
on-host and forces you down the path of installing other RPMs which will need strict versioning.
https://github.com/cri-o/cri-o/blob/master/install.md https://discussion.fedoraproject.org/t/installing-using-cri-o-on-fedora-coreos/15961/5
[cri-o] Releases are also pinned to Kubernetes versions and seem to lag by quite some time
This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).
This strict interlock is an explicit design, highlighted at https://github.com/cri-o/cri-o. To that extent, for k8s distributions the whole "kubelet plus cri-o" is effectively a single "node runtime" component.
@mrunalp @haircommander PTAL
@mrunalp @haircommander Why isn't Fedora a supported platform for CRI-O? Is this just an oversite?
fedora is listed https://github.com/cri-o/cri-o/blob/master/install.md#fedora-31-or-later ~it's just not in the table (which is both out of date and unneeded)~ wait a minute, it is also in the table. Where are you getting indication cri-o is not supported on fedora @dghubble? Are you looking for Fedora CoreOS specific installation instructions?
[cri-o] Releases are also pinned to Kubernetes versions and seem to lag by quite some time
This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).
This strict interlock is an explicit design, highlighted at https://github.com/cri-o/cri-o. To that extent, for k8s distributions the whole "kubelet plus cri-o" is effectively a single "node runtime" component.
I agree with @lucab here, kubelet and cri-o should be installed together with matching versions. If there's a way we (the cri-o team) can minimize the friction with this in FCOS, I'd love to hear about it.
As I see it, the issue here is more that it's a bit difficult to install the cri-o RPM because it's distributed as a module and rpm-ostree can't handle it. At least for OKD, making the version of cri-o that is used in the current OKD version available in a standard, non-modular yum repo would help since we could then include it easily (either in the base compose, or as an extension in a local yum repo) Longer term, rpm-ostree should probably be taught to work with modular repos.
how is the kubelet packaged for OKD now? Could the appropriate cri-o version be tossed in that package?
In OKD we grab kubelet
RPM from origin artifacts, copy it to /tmp/rpms
, extract to /tmp/working and create a new rpm-ostree commit with this dir overlayed.
Similar steps to fetch CRI-O RPM
For comparison, docker+shim (and soon containerd on Flatcar Linux) provide a suitable container runtime that meets Kubernetes CRI minimum needs, out of the box. Kublet is bundled as a container image. Runs. Conformant. We roll forward within days of Kubernetes releases and do need to not wait on changes from the base OS or any package ecosystem.
If we move toward cri-o
, in addition to the friction of adopting cri-o
(it feels like we'd be the first user outside okd), there is this velocity concern. I appreciate you folks have explicit RPM packages for components. In pratice, we'd have to release new Kubernetes with Flatcar Linux only, and somehow have Fedora CoreOS come weeks later or something. We'd be waiting on this package ecosystem. How strict is the cri-o
to k8s
versioning really? Is it just about valiation?
How strict is the cri-o to k8s versioning really? Is it just about valiation?
in practice, not terribly strict. It's the safest bet, though. the cri, while generally stable, does change. We don't backport new cri changes to older versions of cri-o. We also don't attempt to test any kubelet/cri-o skew. Basically: we make no support claims for anything other than matching versions.
Generally, folks don't have much trouble with having mismatched versions (I've never heard anyone complain about it). But it's theoretically possible
Kublet is bundled as a container image.
out of curiosity, do you run the kubelet inside a container, or do you use the image to package it easily?
@vrutkovs @haircommander it looks to me as though we are running the kubelet through the hyperkube binary (see https://github.com/openshift/installer/blob/master/data/data/bootstrap/systemd/units/kubelet.service.template#L15-L16), which is extracted from the payload and written to disk during install (lives in https://github.com/openshift/kubernetes/blob/master/openshift-hack/images/hyperkube/Dockerfile.rhel). I suppose crio could be distributed the same way, that is not as an RPM, but as a container.
that is not as an RPM, but as a container.
Yes, and we had this option in 3.x days. AFAIK some distros (Rancher) do that too - but node SIG doesn't officially support running kubelet (and container engine) in container, as it complicates volumes.
Perhaps we should be extracting binaries to /usr/local/bin
?
If we're just sideloading cri-o
on our own with a "works on OKD with our RPMs" promise, as helpful as that is, it reduces the value prop for FCOS a bit. Could we at least have some "official" flow that guarantees cri-o
can be installed successfully on FCOS? Ideally in FCOS that'd be an Ignition mechanism, but Dusty's script still seems to be the best approach that installs it currently. And some comittment to cri-o
being a supported case on FCOS, beyond a question in the forums?
@haircommander Kubelet in a container with podman, like other on-host services (e.g. etcd). @LorbusChris hyperkube was deprecated upstream back in k8s v1.18, so I'm guessing that's something custom openshift is doing
With v1.22, container runtime experience might become a differentiator for users picking from base OSes. I want both to be strong choices and don't weigh in on my users choice.
@dghubble https://github.com/coreos/rpm-ostree/issues/1435 would probably be the cleanest solution to this then
I agree @LorbusChris , I think that's the best not-hacky way to guarantee cri-o can be installed (and kubelet can be installed with a corresponding version). we could even couple a kubernetes module with kubelet, cri-o and crictl
This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).
How about Fedora CoreOS shipping containerd
then, as the general container runtime? Which would give a window of compatibility @lucab
This is the main reason why cri-o is not shipped as part of the base FCOS. The daemon isn't a generic container runtime that can be freely upgraded by the OS, but it is instead interlocked with the higher level application/service (k8s control plane).
How about Fedora CoreOS shipping
containerd
then, as the general container runtime? Which would give a window of compatibility @lucab
What reasons are there for shipping a general container runtime? If the modules thing is figured out, and cri-o can be installed with kubernetes seamlessly, I don't see a need for containerd
Some of the original value behind Container Linux being packageless were shipping a minimal OS suitable for cluster uses cases (i.e. container runtime is "new enough"). Adding RPMs is more a step toward traditional approaches and slower cadence. What happens when Kubernetes has a release, but cri-o doesn't have an RPM yet if they're in lock-step? For example, how would we test Kubernetes v1.21-beta.1 right now? We'd need to release without Fedora CoreOS, and I can see users gravitating toward Flatcar/containerd if this just isn't even a factor there.
To be clear, I don't care which container runtime is chosen. I have no horse in this race (thank you CRI). Just that it work well in these cases going forward.
Some of the original value behind Container Linux being packageless were shipping a minimal OS suitable for cluster uses cases (i.e. container runtime is "new enough"). Adding RPMs is more a step toward traditional approaches and slower cadence. What happens when Kubernetes has a release, but cri-o doesn't have an RPM yet if they're in lock-step? For example, how would we test Kubernetes v1.21-beta.1 right now? We'd need to release without Fedora CoreOS, and I can see users gravitating toward Flatcar/containerd if this just isn't even a factor there.
I see this as mostly a syncing of requirements. Up until now, we haven't had a request to test a yet-to-be-released cri-o in FCOS. I see no reason we couldn't ship the module early--assuming we can set expectation about stability before the .0 release. I'm happy to work together to setup cri-o packaging better on FCOS
CRI-O is also being built upstream in non-module RPMs, rpm-ostree can install those - see crio 1.20
If we're just sideloading
cri-o
on our own with a "works on OKD with our RPMs" promise, as helpful as that is, it reduces the value prop for FCOS a bit. Could we at least have some "official" flow that guaranteescri-o
can be installed successfully on FCOS? Ideally in FCOS that'd be an Ignition mechanism, but Dusty's script still seems to be the best approach that installs it currently.
Yes, this is #681. We definitely need to improve the UX on package layering. (Modularity is something that this sugar will need to consider as well.)
And some comittment to
cri-o
being a supported case on FCOS, beyond a question in the forums?
Agreed we need to discuss this (IMO, yes this should be supported). Essentially, I think we need to:
rpm-ostree install cri-o-1.20
) until we do #681I don't think rpm-ostree natively supporting modules is a blocker for all this, but it definitely would make it easier.
If we're just sideloading
cri-o
on our own with a "works on OKD with our RPMs" promise, as helpful as that is, it reduces the value prop for FCOS a bit. Could we at least have some "official" flow that guaranteescri-o
can be installed successfully on FCOS? Ideally in FCOS that'd be an Ignition mechanism, but Dusty's script still seems to be the best approach that installs it currently.Yes, this is #681. We definitely need to improve the UX on package layering. (Modularity is something that this sugar will need to consider as well.)
And some comittment to
cri-o
being a supported case on FCOS, beyond a question in the forums?Agreed we need to discuss this (IMO, yes this should be supported). Essentially, I think we need to:
* decide on a version (or range of versions) of k8s we want to support
I think supporting the three releases k8s upstream supports should work
* work with the cri-o team to have the matching RPMs for those versions available (either as proper modules in Fedora, or e.g. using [the new OS extensions work](https://github.com/coreos/rpm-ostree/pull/2439) to ship them in a side yum repo)
I am happy to go forward with either, though my preference is proper module support (it feels more idiomatic for fedora)
* add CI tests that sanity-checks the supported cri-o versions (at least with Prow, we can do full e2e testing for the version currently targeted by OKD)
big +1
make sure that the install layer (OKD/Typhoon) knows how to drive rpm-ostree to install the right cri-o version for the target k8s version being installed (e.g. enable a yum repo, then call rpm-ostree install cri-o-1.20)
I suppose if we go the extensions route, we could use the existing OBS infrastructure for this (enable the OBS repo instead of some special cri-o one)
What reasons are there for shipping a general container runtime? If the modules thing is figured out, and cri-o can be installed with kubernetes seamlessly, I don't see a need for containerd
Fedora CoreOS is meant to be a general platform for running containers. If there's a container runtime that people want to use, and there are no technical blockers to shipping it, we should probably ship it. We already ship Moby and podman, and would already be shipping cri-o if not for the version-skew issue.
Is there some reason we shouldn't ship containerd?
By the way there is already a containerd
package shipped in FCOS, which is getting in as a dependency. We don't cover that explicitly in our CI, so I don't know whether it actually works.
We need to look to shrink the size of FCOS not increase its size.
@rhatdan Sure; see also #186. In general, we try to be careful about adding packages to the distro. Historically, though, when a package is important to the core functionality of FCOS (supporting hardware and running containers) and can't be run in a container itself, we've tended to add it to the distro. I suspect the biggest reductions in distro size would come from trimming dependencies from packages in our core set.
@dghubble From your perspective, what is FCOS missing in order to support containerd as a viable option? Does the package mentioned in https://github.com/coreos/fedora-coreos-tracker/issues/767#issuecomment-802674793 meet your needs?
I want to replace all moby-engine/containerd with podman-docker, and then treat moby-engine/containerd the same way we treat kublet/cri-o as layers on top of FCOS.
du -sm /usr/bin/docker /usr/bin/dockerd /usr/bin/runc /usr/bin/containerd 69 /usr/bin/docker 112 /usr/bin/dockerd 19 /usr/bin/runc 55 /usr/bin/containerd
du -s /usr/bin/docker /usr/bin/dockerd /usr/bin/runc /usr/bin/containerd 69920 /usr/bin/docker 113696 /usr/bin/dockerd 19276 /usr/bin/runc 55772 /usr/bin/containerd
versus
du -s /usr/bin/podman /usr/bin/crun /usr/bin/conmon 44580 /usr/bin/podman 356 /usr/bin/crun 132 /usr/bin/conmon
I appreciate the efforts to make the container tools you work on smaller. Though the size of the binary has not been a decider for me in choosing a container runtime.
Providing container runtime(s) is a core part of choosing Fedora CoreOS and we need ready options to replace dockershim. If either containerd or cri-o or both were shipped that'd be nice. Both would keep the projects honest (why not compete head to head) and give options that are definitely in FCOS's wheelhouse. I could see maintenance pushback being legitimate though.
@bgilbert I noticed containerd shipped with docker in December and tried it then. There were some difficulties getting the right /etc/containerd/config.toml
(the default is junk values) for CNI to be happy and I stopped looking at it since it didn't seem officially supported anyway. I could revisit. https://github.com/poseidon/typhoon/pull/959
EDIT: It kinda works minimally. But it needs additional CNI plugins it didn't before (firewall, tuning, etc). Sideloading those by hand (which could later be done by the flannel-cni daemonset) and pods can start at least. It would be helpful to have crictl
.
@rhatdan The need to layer in cri-o and kubelet is something we've accepted out of necessity, but I don't think we should embrace that pattern when it's not necessary. Ideally FCOS would fully support all container workflows out of the box, without requiring the user to assemble any additional parts. And we do have users who want to run Docker.
@dghubble That'd be great. Any guidance you can provide would be helpful.
This topic is on the agenda for possible discussion at tomorrow's community video meeting with FCOS and Podman developers. For details, see https://github.com/coreos/fedora-coreos-tracker/issues/768.
The problem with supporting all of the container work flows is size. Eliminating moby-engine and friends eliminates almost 20% of the size of FCOS. Podman can provide everthing you need to do for Docker.
If you want to run Kubernetes on FCOS you are going to need more then just containerd. You will need crictl, kublet, plus a few other tools, these are not shipped by default in FCOS and the Kublet tends to be tied to specific versions of the container engine.
@rhatdan In this issue, I'm interested in what Kubernetes compatible container runtimes will be supported on FCOS. I'm already using and happy with podman as the container runner on-host for systemd units, etc. I don't think we have to rehash that topic (which I view as separate) or make it about podman vs docker. I'm already sold on podman.
You can run conformant Kubernetes on FCOS today, without layerying in any RPM packages. Kubelet was covered here. Its an Openshift specific design choice to use RPMs (which is fine, I don't want to make this about how y'all build your product). Just to emphasize that this pattern is a choice, not a neccessity. Today the base OS provides us the container runtime for Kubernetes, currently docker(shim), and in future some suitable replacement.
Via injecting an Ignition config today one can do:
$ sed -i s,enabled=,enabled=1, /etc/yum.repos.d/fedora-modular.repo
$ rpm-ostree install crio
That will pull the latest crio and install it live; indeed rpm-ostree is unaware of modules though and will just pick the latest.
Soon you'll be able to add --apply-live
there and avoid the reboot (if applicable).
We could ask rpm-ostree to pull a specific version but that gets messed up by release field (e.g. rpm-ostree install cri-o-1.19
won't work). One pretty easy hack around (and I swear I'm not trying to get out of rpm-ostree supporting modularity properly, but it could be a viable short-term solution) is to have the cri-o modules have a symbolic Provides
that's just Provides: cri-o-%{version} = %{release}
and then we can just do rpm-ostree install cri-o-1.19
and make sure the host stays on the 1.19 stream.
I appreciate the efforts to make the container tools you work on smaller. Though the size of the binary has not been a decider for me in choosing a container runtime.
Providing container runtime(s) is a core part of choosing Fedora CoreOS and we need ready options to replace dockershim. If either containerd or cri-o or both were shipped that'd be nice. Both would keep the projects honest (why not compete head to head) and give options that are definitely in FCOS's wheelhouse. I could see maintenance pushback being legitimate though.
@bgilbert I noticed containerd shipped with docker in December and tried it then. There were some difficulties getting the right
/etc/containerd/config.toml
(the default is junk values) for CNI to be happy and I stopped looking at it since it didn't seem officially supported anyway. I could revisit. poseidon/typhoon#959EDIT: It kinda works minimally. But it needs additional CNI plugins it didn't before (firewall, tuning, etc). Sideloading those by hand (which could later be done by the flannel-cni daemonset) and pods can start at least.
The CNI plugins are shipped in fcos but most CNI implementations require writing new binaries and reading plugins from /opt/cni/bin
. They are located in /usr/libexec/cni
which are readonly.
I solved this by symlinking each cni plugin binary with:
ExecStartPre=-/bin/sh -c "for f in /usr/libexec/cni/*; do ln -s \"$f\" /opt/cni/bin/$(basename $f); done"
It would be helpful to have
crictl
.
crictl
is crucial for debugging workloads previously able to use docker
.
@wernerb I don't see CNI plugins as being on Fedora CoreOS's plate. CNI plugins are often placed on hosts by DaemonSets. Not a worry. @bgilbert I'll post back with more details, probably not during a work week @cgwalters thanks, will see if we can get this into a systemd unit and try it out
We could ask rpm-ostree to pull a specific version but that gets messed up by release field (e.g.
rpm-ostree install cri-o-1.19
won't work)
To note, DNF does support dnf install 'foo = version'
as a way to request package installation. Does rpm-ostree not have that capability?
We could ask rpm-ostree to pull a specific version but that gets messed up by release field (e.g.
rpm-ostree install cri-o-1.19
won't work)To note, DNF does support
dnf install 'foo = version'
as a way to request package installation. Does rpm-ostree not have that capability?
Sorry, I mixed up terminology in that comment. The issue isn't the release field, it's the patch component of the version string. E.g. you can't ask for cri-o-1.20
. You have to pin to a specific version but here we'd want to pin at the minor level.
Anyway, I've been working on teaching modules to rpm-ostree so hopefully soon we can do this properly without any hacks. (Edit: see https://github.com/coreos/rpm-ostree/pull/2760#issuecomment-825855951.)
In my home lab I use fedora 33 to host a small kubernetes cluster with cri-o. I use dnf modularity for cri-o version management and the dnf versionlock plugin to provide version management of packages from the upstream kubernetes repository. dnf versionlock add --raw kube???-1.20.?
will allow dnf to install the current version kubeadm, kubectl, and kubelet. It also will allow updates to any patch version of the 1.20 release. Would it be feasible to add a similar capability to rpm-ostree? (update - added the --raw tag which is essential for the versionlock behavior needed)
I've been running clusters using containerd
(available by default) to replace the docker-shim
lately. With the broad use, forward compatability, and already being installed, this is the likely direction for the underlying container runtime in Typhoon.
OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
Fedora CoreOS 34.20210529.2.0 5.12.7-300.fc34.x86_64 containerd://1.5.0
Fedora CoreOS 34.20210529.2.0 5.12.7-300.fc34.x86_64 containerd://1.5.0
Fedora CoreOS 33.20210413.dev.0 5.10.19-200.fc33.aarch64 containerd://1.4.4
cc @bgilbert
For users wishing to use cri-o this leaves us in a sad place :(
Basically one of the most appealing options is Kubic though using it in a automated fashion with tools like matchbox and typoon is incredibly hard with AutoYasT. It's a great opportunity for FCOS to pickup cri-o and have it as a default though I understand the technical constraints..
Update on this: proper support for modularity has now merged in rpm-ostree (https://github.com/coreos/rpm-ostree/pull/2760). So in the next release, one should be able to do e.g.
$ rpm-ostree ex module install cri-o:1.20
For now, it'd work to do this in a systemd unit and reboot like in https://docs.fedoraproject.org/en-US/fedora-coreos/os-extensions/. But eventually, we still want to polish the UX for extensions as in https://github.com/coreos/fedora-coreos-tracker/issues/681.
Related patch to stop disabling modular repos in FCOS at: https://github.com/coreos/fedora-coreos-config/pull/1149.
So circling back to earlier discussions here, leveraging modularity I think at this point we should be able to form a stance on which supported cri-o runtime versions are supported in collaboration with the containers team.
@haircommander, you mentioned in https://github.com/coreos/fedora-coreos-tracker/issues/767#issuecomment-799700202 we should just support all the versions supported upstream, which makes sense to me. Should we also have a stream for the next development version? (E.g. right now cri-o:1.22
.)
Then the next step would be adding CI for testing the supported cri-o versions.
Would it make sense for Fedora to adopt a cri-o:stable
module for a somewhat "rolling" cadence of cri-o? It would follow the latest CRI-o release with kubernetes. This would of course create somewhat of a drift between the CRI and the user updating their kubelet. Though it just leaves updating the kubernetes components to the user instead of the CRI as well. Which can come much earlier then kubernetes components and may not be easy to change.
In my ideal universe, we would package all the k8s binaries, cri-o and crictl as part of a kubernetes package, all of which would be modular per-release
(another step we need is reintroducing cri-o package to fedora--there was a mistake and it was orphaned: https://bugzilla.redhat.com/show_bug.cgi?id=1970050)
@haircommander, I noticed that f35 only has 1.19, while f34 has 1.20. And f36 only has 1.21. Is this intended or does the module just need some loving?
Some effort needs to be put in to bring everything back in sync.
Kubernetes intends to drop drop support for docker-shim as a container runtime in v1.22. Currently, Fedora CoreOS
33.20210217.3.0
ships docker19.03.13
. docker-shim remains the most stable, tested, available out-of-the-box runtime, but this will end soon. I'd like some kind of clarity on Fedora CoreOS's intentions, such as shipping a compatiblecontainerd
orcri-o
. With Kubernetes cutting v1.22 alpha releases (likely pretty soon) in the time frame of Fedora 34, as Kubernetes distros we'll want to start evaluating and conformance testing the selection.Overall, I'd like to know there is some plan. Ideally a documented one. Flatcar Linux has published their intentions to ship containerd in time and already has a mechanism to test it (docs).
cri-o
It sounds like
cri-o
can only be installed by downloading an RPM (from where?) directly and rpm-ostree installing it (unverified). dnf and yumdownloader used by an openshift script aren't present. https://github.com/coreos/fedora-coreos-tracker/issues/292#issuecomment-796998069. I have some maturity concerns about that. Is this really the recommended path? For a container optimized distro to not have a better path to getting a container runtime?Releases are also pinned to Kubernetes versions and seem to lag by quite some time, so I have some velocity concerns (the runtime needs to be fairly stable, we roll forward when Kubernetes does, think hours).
containerd
I haven't seen Fedora CoreOS plans to make this the runtime of choice.
Or is the plan something else?