containernetworking / cni

Container Network Interface - networking for Linux containers
https://cni.dev
Apache License 2.0
5.55k stars 1.08k forks source link

flannel plugin not working with kubernetes #250

Closed cheld closed 8 years ago

cheld commented 8 years ago

Problem:

The kubernetes service IPs (10.0.x.x) are not reachable from inside of the container

Steps to reproduce:

{
    "name": "mynet",
    "type": "flannel",
    "delegate": {
        "bridge": "mynet",
        "isDefaultGateway": true
    }
}
kubectl run test --image gcr.io/google_containers/hyperkube-amd64:67 -- sleep 10000
docker exec -it <id> /bin/bash
curl -k -u admin:admin https://10.0.0.1  # kubernetes master

url: (7) Couldn't connect to server

Comment:

° The command curl -k -u admin:admin https://10.0.0.1 works on the host. So, iptables seems to be setup correctly.

° Using default bridge configuration from readme.md works fine.

CC @zreigz, @batikanu, @taimir

steveej commented 8 years ago

/cc @tomdee

tomdee commented 8 years ago

Hi @cheld Do you think this is a regression? Have you had it working previously? How are you configuring flannel, e.g. which backend are you using? Is this just an issue contacting service IPs? Can you contact podIPs that are on different hosts?

zreigz commented 8 years ago

I have been testing it with centos 7. I have created script to set up entire environment. Prerequisites: 1 create directories:

mkdir -p /opt/cni/bin
mkdir -p /etc/cni/net.d

2 copy plugins to /opt/cni/bin

curl -L https://github.com/containernetworking/cni/releases/download/v0.3.0/cni-v0.3.0.txz -O
xz -d < cni-v0.3.0.txz | tar xvf - -C /opt/cni/bin

3 create configuration vi /etc/cni/net.d/10-containernet.conf

and enter

{
  "name": "containernet",
  "type": "flannel",
  "subnetFile": "/var/run/flannel/subnet.env",
    "delegate": {
        "bridge": "mynet0",
        "mtu": 1450,
        "isDefaultGateway": true
    }
}

and run this script with your MASTER_IP variable

#!/bin/bash

set -e

MASTER_IP=${MASTER_IP:-10.0.0.3}

docker run -d \
    --net=host \
    gcr.io/google_containers/etcd-amd64:2.2.1 \
    /usr/local/bin/etcd \
      --listen-client-urls=http://127.0.0.1:4001,http://${MASTER_IP}:4001 \
      --advertise-client-urls=http://${MASTER_IP}:4001 \
      --data-dir=/var/etcd/data

sleep 5

docker run \
        --net=host gcr.io/google_containers/etcd-amd64:2.2.1 \
        etcdctl \
        set /coreos.com/network/config \
            '{ "Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}'

flannelCID=$(docker run \
        --restart=on-failure \
        -d \
        --net=host \
        --privileged \
        -v /dev/net:/dev/net \
        quay.io/coreos/flannel:0.5.5 \
        /opt/bin/flanneld \
            --ip-masq=true \
            --iface=eth0)

mkdir -p /var/run/flannel

sleep 5

docker cp ${flannelCID}:/run/flannel/subnet.env /var/run/flannel/subnet.env

docker run \
        --name=kubelet \
        --volume=/:/rootfs:ro \
        --volume=/sys:/sys:ro \
        --volume=/var/lib/docker/:/var/lib/docker:rw \
        --volume=/etc/cni/net.d:/etc/cni/net.d:rw \
        --volume=/opt/cni/bin:/opt/cni/bin:rw \
        --volume=/var/run:/var/run:rw \
        --volume=/var/lib/kubelet:/var/lib/kubelet:rw \
        --net=host \
        --pid=host \
        --privileged=true \
        -d \
        gcr.io/google_containers/hyperkube-amd64:v1.3.0-beta.1 \
        /hyperkube kubelet \
            --hostname-override=${MASTER_IP} \
            --address="0.0.0.0" \
            --api-servers=http://localhost:8080 \
            --config=/etc/kubernetes/manifests-multi \
            --cluster-dns=10.0.0.10 \
            --cluster-domain=cluster.local \
            --allow-privileged=true --v=2 \
            --pod-infra-container-image=gcr.io/google_containers/pause:2.0 \
            --network-plugin=cni --network-plugin-dir=/etc/cni/net.d
cheld commented 8 years ago

Hi @tomdee, thanks for input.

1) Regression: I think it is not a regression. There have been a few issues related to cni plugin that have just been merged. See also comment https://github.com/kubernetes/kube-deploy/pull/69#issuecomment-226689759

2) Setup: similar to what zreigz described. Default backend of flannel is used (udp). Only one node

3) Connectivity: I just did one additional test:

Steps: Start hyperkube

Start container with cni-script and print routes

sudo CNI_PATH=$CNI_PATH ./docker-run.sh -it --rm gcr.io/google_containers/hyperkube-amd64:67 /bin/bash
 ip route
default via 10.1.57.1 dev eth0 
10.1.0.0/16 via 10.1.57.1 dev eth0 
10.1.57.0/24 dev eth0  proto kernel  scope link  src 10.1.57.3 

Print route form container stared by kubelet

kubectl run test --image gcr.io/google_containers/hyperkube-amd64:67 -- sleep 1000000
docker exec -it <id> /bin/bash
ip route
10.1.0.0/16 via 10.1.57.1 dev eth0 
10.1.57.0/24 dev eth0  proto kernel  scope link  src 10.1.57.4

The default route is missing. hmmm...any quick idea?

cheld commented 8 years ago

Ok got it. :))

In first case, I have compiled the cni binaries from master. In second case (hyperkube) I used the CNI binaries that are packed into the container, which is some release of CNI (c864f0e1ea73719b8f4582402b0847064f9883b0). After upgrading the binaries in the container everthing works.