Should `conmon` be invoked?

c3d commented 1 year ago

Running _ podman --log-level=debug --runtime $PWD/run-kata run -it fedora bash, I see the following in the logs:

DEBU[0001] running conmon: /usr/bin/conmon               args="[--api-version 1 -c 893548c0769c0ac17edffe7d591763099193d7f28b59eebb15f10e16d52038dd -u 893548c0769c0ac17edffe7d591763099193d7f28b59eebb15f10e16d52038dd -r /home/ddd/Work/ociplex/ociplex/run-kata -b /var/lib/containers/storage/overlay-containers/893548c0769c0ac17edffe7d591763099193d7f28b59eebb15f10e16d52038dd/userdata -p /var/run/containers/storage/overlay-containers/893548c0769c0ac17edffe7d591763099193d7f28b59eebb15f10e16d52038dd/userdata/pidfile -n inspiring_hertz --exit-dir /var/run/libpod/exits --full-attach -s -l journald --log-level debug --syslog -t --conmon-pidfile /var/17erun/containers/storage/overlay-containers/893548c0769c0ac17edffe7d591763099193d7f28b59eebb15f10e16d52038dd/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-aru/og --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --network-config-dir --exit-command-arg  --exit-command-arg --network-backend --exit-command-arg cni --exit-command-arg --volumepath --exit--olcommand-arg /var/lib/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg boltdb --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg /home/ddd/Work/ociplex/ociplex/run-kata --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --stom- rage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 893548c0769c0ac17edffe7d591763099193d7f28b59eebb15f10e16d52038dd]"
INFO[0001] Running conmon under slice machine.slice and unitName libpod-conmon-893548c0769c0ac17edffe7d591763099193d7f28b59eebb15f10e16d52038dd.scope
DEBU[0001] Received: -1

It's unclear to me if conmon should actually be running in a shim-v2 scenario. Also what is the --api-version 1? What API is this referring to?

c3d commented 1 year ago

Exploring a little more, it seems to be unclear that podman has provisions for not invoking conmon at all.

In cri-o/internal/oci/oci.go, we have code that selects a different implementation path when the runtime type is "vm":

func (r *Runtime) newRuntimeImpl(c *Container) (RuntimeImpl, error) {
    rh, err := r.getRuntimeHandler(c.runtimeHandler)
    if err != nil {
        return nil, err
    }

    if rh.RuntimeType == config.RuntimeTypeVM {
        return newRuntimeVM(rh.RuntimePath, rh.RuntimeRoot, rh.RuntimeConfigPath, r.config.RuntimeConfig.ContainerExitsDir), nil
    }

    if rh.RuntimeType == config.RuntimeTypePod {
        return newRuntimePod(r, rh, c)
    }

    // If the runtime type is different from "vm", then let's fallback
    // onto the OCI implementation by default.
    return newRuntimeOCI(r, rh), nil
}

In containerd/pkg/cri/server/helpers_linux.go, the approach is different, since it does not rely on the "vm" string in the runtime type but on the io.containerd.kata name:

var vmbasedRuntimes = []string{
    "io.containerd.kata",
}

func isVMBasedRuntime(runtimeType string) bool {
    for _, rt := range vmbasedRuntimes {
        if strings.Contains(runtimeType, rt) {
            return true
        }
    }
    return false
}

Could not find any equivalent code in podman.

c3d commented 1 year ago

The message for podman is coming from podman/libpod/oci_conmon_common.go.

func (r *ConmonOCIRuntime) createOCIContainer(ctr *Container, restoreOptions *ContainerCheckpointOptions) (int64, error) {

...
    logrus.WithFields(logrus.Fields{
        "args": args,
    }).Debugf("DDD1: running conmon: %s", r.conmonPath)

    cmd := exec.Command(r.conmonPath, args...)
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Setpgid: true,
    }
...

}

There does not seem to be any provision to not run conmon without modifying podman. So there are a few options:

Modify podman to avoid running conmon and trigger the error above
Not modify podman but find a way to inject an alternate conmon that does not error out when called
Find a way to ignore the conmon error.

c3d commented 1 year ago

Likely call stack in podman:

    // podman/libpod/oci_conmon_common.go:202
    return r.createOCIContainer(ctr, restoreOptions)

This is called from line 183:

// CreateContainer creates a container.
func (r *ConmonOCIRuntime) CreateContainer(ctr *Container, restoreOptions *ContainerCheckpointOptions) (int64, error) {
    // always make the run dir accessible to the current user so that the PID files can be read without
    // being in the rootless user namespace.
    if err := makeAccessible(ctr.state.RunDir, 0, 0); err != nil {
        return 0, err
    }
    if !hasCurrentUserMapped(ctr) {
        for _, i := range []string{ctr.state.RunDir, ctr.runtime.config.Engine.TmpDir, ctr.config.StaticDir, ctr.state.Mountpoint, ctr.runtime.config.Engine.VolumePath} {
            if err := makeAccessible(i, ctr.RootUID(), ctr.RootGID()); err != nil {
                return 0, err
            }
        }

I see nothing in that path that seems to be able to avoid conmon at all.

c3d commented 1 year ago

The interface seems to be from ConmonOCIRuntime. So do we need to build a non-conmon OCI runtime? There is a OCIRuntime interface (podman/libpod/oci.go).

There seems to be a notion of "non-legacy OCI runtime" notion (podman/libpod/boltdb_state_internal.go):

        // Handle legacy containers which might use a literal path for
        // their OCI runtime name.
        runtimeName := ctr.config.OCIRuntime
        ociRuntime, ok := s.runtime.ociRuntimes[runtimeName]
        if !ok {
            runtimeSet := false

c3d commented 1 year ago

Curious why there is a sqllite_state_internal.go and boltdb_state_internal.go. They seem very similar, and contain logic that does not seem related to databases at all. Notably, the creation of the runtime objects (and logic such as path lookup) are in these files, see getCOntainerStateDB or finalizeCtrSqlite calling newConmonOCIRuntime (the other place being in libpod/runtime.go which seems a bit more logical).

A bit weird.

c3d commented 1 year ago

Apparently, no OCI runtime is created other than through newConmonOCIRuntime, and the conmon path is always passed through an argument to that function, from runtime.conmonPath.

c3d commented 1 year ago

Exit command arguments, passed with --exit-command and --exit-command-arg:

 /usr/local/bin/podman --root /var/lib/containers/storage --runroot /var/run/containers/storage --log-level debug --cgroup-manager systemd --tmpdir /var/run/libpod --network-config-dir  --network-backend cni --volumepath /var/lib/containers/storage/volumes --db-backend boltdb --transient-store=false --runtime /home/ddd/Work/ociplex/ociplex/run-kata --storage-driver overlay --storage-opt overlay.mountopt=nodev,metacopy=on --events-backend journald --syslog container cleanup bfdf589efbf1ecd78142e44f7add27de199567b5d50a310b94535ae7ec23ffe8

So when the container dies, we call podman container cleanup. Also gives interesting insights about the database being used. But it looks like all this should be passed directly to run-kata, not through podman.

c3d / ociplex

Should `conmon` be invoked? #13