clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
589 stars 70 forks source link

How to make dockershim support clear containers? #987

Open miaoyq opened 6 years ago

miaoyq commented 6 years ago

Recently I was trying to make dockershim support cc, and found that dockershim supporting cc need to meet the following conditions at least:

  1. Dockershim need provide container type(app or sandbox container) info in Annotation that clear containers need. dockershim have provided this info in Label` field of docker container struct.
  2. Docker need pass Label of docker container to Annotations of OCI runtime spec. I have sent a pr to moby project about this, see: moby/moby/pull/36181
  3. Annotations(Labels) of dockershim can be recognized by cc-runtime The annotations(labels) are defined as private fields in dockershim and can't be referenced by virtcontainer. So I only modify the codes locally for testing.
  4. Kubelet should be able to get the IP address after the sandbox started The IP address of the veth device that is create by cni plugin will be remove when cc-runtime create a tap device linked to the veth device. So we must create the network namespace and get the IP address of the veth before creating the sandbox container with the existing network namespace, just like this issue description in cri-containerd. However, docker does not provide an interface for using existing network namespace to create containers, we must use the namespace created by docker, so we can't get the IP via cni plugin after setting up the network since the IP of veth have been removed. This is the main obstruction point for me at present.

After modifying the code corresponding to the first three conditions, and compiling codes of each component and setting up the k8s cluster, I could create the pod successfully but the network of pod could not be set up correctly. I do not know whether it is related to conditions, expect to get help here. Thanks!

@sameo @plutoinmii

/cc @guangxuli

sameo commented 6 years ago

cc @egernst

@miaoyq I think we're looking at a couple of potential solutions here:

  1. Have virtcontainers somehow not cleaning the veth IP so that PodSandboxStatus() would just work.
  2. Have docker-shim cache the pod IP and use that cache instead of running unnecessary netns calls into the pod networking namespace. A pod IP is immutable, so there's no reasons for always checking about it.
  3. Use the Docker engine API (GET /containers/(id or name)/json) to ask for the pod container network settings from the Docker engine itself.

Solution #2 is obviously the most appealing to us, as #1 would be quite disruptive for CC in general. Solution #3 may work, but since I don't have a working docker-shim based CC deployment, I can't guarantee that. If it does work, it would be a quick and portable solution. #2 would still be more efficient...

egernst commented 6 years ago

Thanks for the heads up @sameo

Interesting, painful work @miaoyq! I don't have any other input aside from what sameo already pointed out. Are you looking to compare CC with dockershim v cri-containerd?

miaoyq commented 6 years ago

@sameo Thanks for providing the potential solutions

  1. Use the Docker engine API (GET /containers/(id or name)/json) to ask for the pod container network settings from the Docker engine itself.

The third solution doesn't work, since the network of sandbox is set up by cni plugin, so docker can not get the IP of the veth. I have verified this locally. :-)

The second solution may be the best one. I will also try to seek better ways to get the sandbox IP on dockershim side.

miaoyq commented 6 years ago

Are you looking to compare CC with dockershim v cri-containerd?

@egernst Yeah. k8s + cri-containerd(containerd) + cc can work normally. Currently, The main difference between dockershim and cri-containerd is the network creation of the sandbox.

sameo commented 6 years ago

@miaoyq There already is a per pod map in the dockerService to track if a pod's network is ready. It would seem logical to add an IP string there in order to track both the network readiness and its IP. It should not be a big effort neither, so probably worth it.

miaoyq commented 6 years ago

@sameo Yeah, it's a good idea. Actually, dockershim have provided the way to store the pod data. When run a sandbox, the checkpointHandler of dockerService will create a checkpoint for this pod, the checkpoint can be used to track the pod IP. :-)

miaoyq commented 6 years ago

@sameo Today, I ran a container via docker + cc, and get the container pid by docker inspect -f '{{.State.Pid}}' [container_id], I found the pid of container is not the pid of qemu process (in container network namespace) but the pid of cc-shim process (in host network namespace).

In dockershim, the pid of container is used to get the netns path of sandbox container, so dockershim can't get the right netns path currently.

Could we get the pid of qemu process instead of cc-shim process when cc-runtme create vm-based container?

sboeuf commented 6 years ago

@miaoyq we actually don't want to provide the pid of the qemu process since this is related to the pod and not to the container itself. What we could do to solve your issue is to make sure that we run the shim in the right network namespace. Look into virtcontainers shim implementation cc_shim.go pointing to some function in shim.go. This is where we should spawn the shim process inside the right namespace.

Please open an issue on virtcontainers repo for that, so that we can discuss further on this.

miaoyq commented 6 years ago

Thanks @sboeuf, I also think it's reasonable to run shim in container network namespace. :) I have filed an issue on virtcontainers repo. See: https://github.com/containers/virtcontainers/issues/615

miaoyq commented 6 years ago

@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?

miaoyq commented 6 years ago

@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?

I think I have got the answer in https://github.com/containers/virtcontainers/issues/241. Looking forward to this feature. :-)