containerd / nri

Node Resource Interface
Apache License 2.0
257 stars 65 forks source link

Building the Container Device Interface with NRI #2

Closed RenaudWasTaken closed 1 year ago

RenaudWasTaken commented 4 years ago

Hello there!

For some context, this issue is part of the effort to implement Container Device Interface (CDI) using NRI :) !

For context on CDI: If a vendor wants to add support for its device type, it creates a file (e.g: /etc/cdi/vendor.json) that specifies, the device type, the devices available on the machine and the actions a vendor must perform. Quick example:

$ cat > /etc/cdi/vendor.json <<EOF
{
  "cdiVersion": "0.2.0",
  "kind": "vendor.com/device",
  "cdiDevices": [{
      "name": "myDevice",
      "containerSpec": {
        "devices": [
          {"hostPath": "/dev/card1", "containerPath": "/dev/card1"},
          {"hostPath": "/dev/card-render1", "containerPath": "/dev/card-render1"}
        ]
      }
    }],
}
EOF
#CLI example:
$ myRuntime run --device vendor.com/device=myDevice --device vendor.com/device=myDevice2 myContainer

Below are a few approaches in how I envision CDI working with NRI, let me know if there are different approaches or if my understanding is incorrect.

CDI on top of NRI today.

TLDR: When the CRI-containerd calls StartContainer, the CDI plugin gets invoked, reads the Devices requested by the container (in the spec.resources field) and applies the operations it reads from /etc/cdi/vendor.json (e.g: mount, mknod, cgroups, ...).

There's a few challenges with this approach:

  1. The biggest one is that the container runtime is not aware of the changes the CDI plugin made. This is hugely problematic in the context of a call to Update where if a container is using a device it will loose access to it (at least until the CDI plugin fixes it).
  2. It's not clear how a runtime should handle a user passing a device that only makes sense in the context of CDI. In other words above I assumed that CDI would be invoked with the spec.resources.Linux.Devices field which would contain an entry whose path would be vendor.com/device=myDevice. What would happen today is that containerd would blow up either before or after the NRI call because that device is invalid.

OCI specification is passed up and down NRI plugins.

TLDR: When the CRI-containerd calls CreateContainer, the CDI plugin gets invoked, reads the Devices requested by the container, reads the the operations from /etc/cdi/vendor.json and applies changes to the OCI spec (including removing the CDI devices from the OCI spec).

There's a few challenges with this approach:

  1. It's not really clear to me how the new NRI client interface would look like. Should it pass an OCI specification?
  2. It's not really clear to me how the NRI should be invoked in CreateContainer (it seems like this would be the place we would need to change: https://github.com/containerd/containerd/blob/master/pkg/cri/server/container_create.go#L239-L247)
  3. There might be concerns around passing up and down a "smallish" json file, but that's probably secondary

Thanks for reading this far! Let me know what you think :) !

RenaudWasTaken commented 4 years ago

Cc @crosbymichael

crosbymichael commented 4 years ago

thanks, checking it out!

crosbymichael commented 4 years ago

This looks like you are understanding everything correctly. What is your preference in how you would like this to work?

I think both approaches are valid. I think for the first section, NRI is also invoked on all Updates as well so that should resolve your issue about the upper layers not being aware of the new devices. For 2. Are you saying that CDI users would add a mount to the spec with that path?

crosbymichael commented 4 years ago

Hey, good news. I hit the issue where if a container gets an update call after an nvidia plugin runs, it clears out the cgroup devices rules. I'm going to see how we could either solve this with NRI or other updates to the low level components

RenaudWasTaken commented 4 years ago

This looks like you are understanding everything correctly. What is your preference in how you would like this to work?

In my mind, the better solution is to pass to containerd the OCI changes that are requested. Operations that change the container sandbox are typically operations I want the container runtime to be aware of, so that it doesn't override my change accidentally.

crosbymichael commented 4 years ago

Ya, so ideally, we need a "pre-create" type step in the lifecycle so that a set of nri plugins have a chance to make modifications to the runtime-spec before a container is created. Is this correct?

RenaudWasTaken commented 3 years ago

Sorry for the lag I missed your message. A pre-create step is where I suspect we should be going for this.

fuweid commented 3 years ago

/cc

klihub commented 1 year ago

containerd and CRI-O have both merged PRs for native/built-in CDI support which obsolete this issue.