The c3d/podman repository is a fork of containers/podman focusing on bringing back compatibility with shimv2 runtimes such as
Kata Containers
Background
The shimV2 interface works very differently compared to the usual OCI command-line interface used by runc or crun.
The CLI approach creates a new process for each container operation. For example, creating a container leads to a runc create command. This relies on some underlying mechanism (e.g. a containerd daemon) to manage the persistent state associated with the container. In particular, for each running container, a conmon process is launched that will keep track of the I/O and exit code of the underlying container.
The ShimV2 approach creates a single shim that potentially can manage multiple containers, and communicates with the runtime over GRPC / TTRPC. In that case, creating a container is achieved by sending a CreateContainer request over these channels. The shim is invoked giving it the address of the socket used for these protocols.
An attempt to isolate podman from the underlying differences by inserting an intermediary tool that would perform the translation was attempted, with a tool called ociplex. This proved difficult, in no small part because what podman calls an "OCI runtime" internally hard-codes a conmon interface. This interface is quite specific with respect to what it expects from the underlying container, and podman is also specific with respect to what it expects from conmon.
This kind of hard-coded behaviors makes it hard in practice for ociplex to masquerade as conmon as seen by podman and as containerd as seen by the shimv2 runtime. Some changes on the podman side appeared indispensable, e.g. so that podman could call an alternate conmon binary (ociplex in our case).
Since changing podman appears inevitable, it might be better to do it the right way. However, "the right way" remains complicated. Since this might end up being a large undertaking, this repository will contain a separate set of issues so as not to pollute the main podman repository. This will allow me to document the changes I'm doing publicly instead of in private notes.
Overall sequence of events
In order for podman to properly communicate with a shimv2 runtime, it needs to behave in a way that is closer to what happens under Kubernetes through containerd or crio.
The sequence of events in that case is the following:
The user types something like podman run -it fedora bash
podman checks the presence of a podman server instance, i.e. a podman process that can listen to shimv2 requests.
If no such instance exists, podman creates it by running podman server, which will listen to a socket and effectively replace conmon for all shimv2 processes.
Once the podman server process exists, podman can launch the shimv2 runtime passing its address, e.g. containerd-shim-kata-v2 -namespace <ns> -address <addr> -publish-binary <path/to/podman>. That shimv2 runtime will then connect with the podman server to establish the RPC channel.
The rest of the process then goes over RPC instead of command-line. This will be documented in other issues.
By default, the podman server will use the following addresses:
The
c3d/podman
repository is a fork ofcontainers/podman
focusing on bringing back compatibility withshimv2
runtimes such as Kata ContainersBackground
The shimV2 interface works very differently compared to the usual OCI command-line interface used by
runc
orcrun
.The CLI approach creates a new process for each container operation. For example, creating a container leads to a
runc create
command. This relies on some underlying mechanism (e.g. acontainerd
daemon) to manage the persistent state associated with the container. In particular, for each running container, aconmon
process is launched that will keep track of the I/O and exit code of the underlying container.The ShimV2 approach creates a single
shim
that potentially can manage multiple containers, and communicates with the runtime over GRPC / TTRPC. In that case, creating a container is achieved by sending aCreateContainer
request over these channels. The shim is invoked giving it the address of the socket used for these protocols.An attempt to isolate
podman
from the underlying differences by inserting an intermediary tool that would perform the translation was attempted, with a tool called ociplex. This proved difficult, in no small part because whatpodman
calls an "OCI runtime" internally hard-codes aconmon
interface. This interface is quite specific with respect to what it expects from the underlying container, andpodman
is also specific with respect to what it expects fromconmon
.This kind of hard-coded behaviors makes it hard in practice for
ociplex
to masquerade asconmon
as seen bypodman
and ascontainerd
as seen by the shimv2 runtime. Some changes on thepodman
side appeared indispensable, e.g. so thatpodman
could call an alternateconmon
binary (ociplex
in our case).Since changing
podman
appears inevitable, it might be better to do it the right way. However, "the right way" remains complicated. Since this might end up being a large undertaking, this repository will contain a separate set of issues so as not to pollute the mainpodman
repository. This will allow me to document the changes I'm doing publicly instead of in private notes.Overall sequence of events
In order for
podman
to properly communicate with ashimv2
runtime, it needs to behave in a way that is closer to what happens under Kubernetes throughcontainerd
orcrio
.The sequence of events in that case is the following:
podman run -it fedora bash
podman
checks the presence of apodman server
instance, i.e. apodman
process that can listen to shimv2 requests.podman
creates it by runningpodman server
, which will listen to a socket and effectively replaceconmon
for all shimv2 processes.podman server
process exists,podman
can launch the shimv2 runtime passing its address, e.g.containerd-shim-kata-v2 -namespace <ns> -address <addr> -publish-binary <path/to/podman>
. That shimv2 runtime will then connect with thepodman server
to establish the RPC channel.By default, the
podman server
will use the following addresses: