antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.65k stars 362 forks source link

changing the CNI_PATH breaks antrea-agent #120

Closed varunmar closed 4 years ago

varunmar commented 4 years ago

Describe the bug If kubelet is started with a cni-bin-dir different from /opt/cni/bin, the antrea-agent can not find the host-local ipam plugin. This is true even if the node itself has the plugin in the cni-bin-directory.

To Reproduce On a running GKE cluster, install antrea using the https://raw.githubusercontent.com/vmware-tanzu/antrea/master/build/yamls/antrea.yml file, but replace the host-cni-bin volume hostpath with /home/kubernetes bin.

Then try to start any pod.

Expected The pod starts.

Actual behavior The pod remains in containerCreating state forever, with the following in the logs -

Warning FailedCreatePodSandBox 82s kubelet, gke-antrea-default-pool-3945f630-16tl Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "cb48229d5ebe51e0d5bce60089ddd59958274cf59e7dcc9f867687e2275d31bd" network for pod "iperf-client": NetworkPlugin cni failed to set up pod "iperf-client_default" network: failed to find plugin "host-local" in path [/home/kubernetes/bin], failed to clean up sandbox container "cb48229d5ebe51e0d5bce60089ddd59958274cf59e7dcc9f867687e2275d31bd" network for pod "iperf-client": NetworkPlugin cni failed to teardown pod "iperf-client_default" network: failed to find plugin "host-local" in path [/home/kubernetes/bin]]

Versions: Please provide the following information:

The default antrea cni configuration looks like this: { "cniVersion":"0.3.0", "name": "antrea", "type": "antrea", "ipam": { "type": "host-local" } }

When the antrea-cni delegates to the antrea-agent, it looks inside the daemonset pod filesystem for the ipam plugin. However, it does so using the CNI_BIN_DIR it gets from kubelet, which is set to /home/kubernetes/bin for all GKE nodes. The daemonset filesystem doesn't have that directory, so the path lookup fails and the pod never gets created.

As an aside, it appears the antrea-agent implements the entire CNI runtime - why was this done, instead of letting kubelet do it and just call the right plugins? It doesn't look like the host-local plugin inside the antrea-agent daemonset is any different from the normal ones in containernetworking/plugins.

Thanks!

tnqn commented 4 years ago

Thanks @varunmar for reporting it. Your understanding and analysis is correct, antrea-agent should either look for host-local in the hardcoded path if the plugin is put in a hardcoded path, or there should be a clear instruction to support a non default cni-bin-dir. We should have a fix for it soon.

For the question, it was done in this way because it requires credentials to get node information(internalIP/externalIP/PodCIDR) from K8s APIServer to do IPAM, routing, tunnel, e.g. , also needs to enforce NetworkPolicy for Pod interfaces. So it seems good to have a single place to do all these things and keep the antrea-cni binary simple (otherwise it would need to put some credentials to host fs and multiple components would program ovsdb and openflow). The host-local plugin inside the image is indeed the same as the one in containernetworking/plugins. It was just in case user don't install cni packages on their hostfs. Does the explanation make sense to you?

varunmar commented 4 years ago

Makes sense, thanks for the explanation! We'll work around the path for now, and look forward to the fix.

The reason for the question is to try to understand how composing plugins (adding portmap, for example) works if there are two entire cni runtimes. But I think we can see what is happening, so we should be good for now. Thanks!