CiscoCloud / kubernetes-ansible

Install and configure Google Kubernetes on OpenStack
Apache License 2.0
82 stars 37 forks source link

Automated IP Assignment - Design #12

Open peterlamar opened 9 years ago

peterlamar commented 9 years ago

As a tenant, I can assign an IP automatically to services on CloudProvider (OpenStack) via cmd line so that services can easily be made external

Currently tenants must modify their /etc/hosts file or do other hacky workarounds to reach the guestbook example in Kubernetes when running outside of Google App Engine. It would be great to automate this and create a better user experience.

altvnk commented 9 years ago

Did you mean CloudProvider (OpenStack) integration?

peterlamar commented 9 years ago

Sure, updated

kenjones-cisco commented 9 years ago

You are referring to providing a public IP address to the kubernetes services themselves?

peterlamar commented 9 years ago

Indeed, let me know if you have other ideas

ghost commented 9 years ago

Maybe, we should try to adopt OpenStack Magnum It's a going to become a native approach in OpenStack to provide containers to the cloud users. Here is a video from the latest summit: https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/magnum-containers-as-a-service-for-openstack

kenjones-cisco commented 9 years ago

If the skydns add-on is enabled then we would have to give skydns a real network block that is addressable. Without skydns, then flannel (or the networking layer for docker used) would need a real network block that is addressable.

With services as defined by Kubernetes, they are good for creating a "virtual name" for a pod, but even using LoadBalancer mode for the service you still get random port assignment. As such I had to leverage the (https://github.com/GoogleCloudPlatform/kubernetes/tree/v1.0.1/contrib/for-demos/proxy-to-service) approach to get a more constant port, that I could then provide the Openstack LoadBalancer such that then I had a public ip address.

peterlamar commented 9 years ago

We have several engineers at Cisco developing magnum we should sync with, this was actually suggested to me Friday.

Solving this will likely guide us to the networking solution we would like to use. We did Calico for MI but lets be open to others if there is a good reason.

kenjones-cisco commented 9 years ago

Sounds good!

ghost commented 9 years ago

With OpenStack we will use Neutron in any case. But we can choose plugins for Neutron such as OVS, Calico, OpenDaylight, etc.

peterlamar commented 9 years ago

This keeps coming back up. Are there any creative solutions that do not integrate with Openstack? We can keep our Openstack integration efforts going, but they will take awhile regardless.

@ldejager Suggested to provide a sort of dynamic (dns) registration service ourselves. For example, user X spins up our k8s solution, upon completion the terraform.py posts the information that would usually go into /etc/hosts to the registration service and get’s back and prints out a unique resolvable DNS name for the instance, I.e. k8s-master.X.cs.co that points to the IP address of the master which they can then reach.

davidwalter0 commented 9 years ago

I'm not sure but this sounds like a list of goals:

1. dns / service discovery, which may be satisfied via the k8s dns addon, and adding the dns resolver for the k8s cluster into the client side appl. 2. automatically exposing ports on an external host so there is a known endpoint address. 3. address resolution via network block mgmt

1) can be tested from the implementations available 2) exposing ports Assuming that (3) address resolution via network block mgmt is working via either bridge[flat cider space] or a flannel implementation the problem is the same for exposing ports transparently.

Currently there's an effort to add transparent proxying: use iptables for proxying instead of userspace #3760, and there's a contribute option to enable a bare metal solution for load balancing w/o a provider specific solution, recently moved service load-balancer . This approach claims it will support cross cluster load balancing.

Let's track the iptables solution, and test the load balancing solution in our environment.

An alternative implementation could be to write our own monitor that injects port forwarding to services, effectively using the default load balancing a normal service provides but forwarding the port from the public machine to the internal k8s service's ip:port. Either etcd or kube event discovery could be monitored to manage the injection and creation of the port forwarding and the default round robin load balancing will apply from the k8s service.

3) A flat address block with flannel traffic is generally NAT but we can invert the bridge to flatten the visibility of the containers. Flannel might look like this:

For this example assume that flannel is a private subnet of 172.24.0.0/16 managed by flannel & k8s nodes: node0: 10.1.12.10 node1: 10.1.12.11 node2: 10.1.12.12 run etcd on node0

cloudinit configuration for etcd2

#cloud-config

coreos:
  etcd2:
    advertise-client-urls: http://10.1.12.10:2379
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://10.1.12.10:2380
    proxy: off

  units:
    - name: etcd2.service
      command: restart
      enable: true

flannel drop in for cloudinit [systemd]

  drop-ins:
    - name: 50-network-config.conf
      content: |
        [Service]
        ExecStartPre=-/usr/bin/etcdctl --peers=10.1.12.10:2379 set /coreos.com/network/config '{ "Network": "172.24.0.0/16" }'
        [Install]
        WantedBy=multi-user.target

$ for node in $(etcdctl ls --recursive /coreos.com/network); do echo ${node} $(etcdctl get ${node} 2>/dev/null); done
/coreos.com/network/config { "Network": "172.24.0.0/16" }
/coreos.com/network/subnets
/coreos.com/network/subnets/172.24.5.0-24 {"PublicIP":"10.1.12.11"}
/coreos.com/network/subnets/172.24.92.0-24 {"PublicIP":"10.1.12.10"}
/coreos.com/network/subnets/172.24.95.0-24 {"PublicIP":"10.1.12.12"}

These are directly routeable via the host routing table

$ ip r
default via 10.1.12.1 dev eth0  proto dhcp  src 10.1.12.10  metric 1024 
10.1.12.0/24 dev eth0  proto kernel  scope link  src 10.1.12.10 
10.1.12.1 dev eth0  proto dhcp  scope link  src 10.1.12.10  metric 1024 
172.24.0.0/16 dev flannel0  proto kernel  scope link  src 172.24.92.0 
172.24.92.0/24 dev docker0  proto kernel  scope link  src 172.24.92.1 

So flannel fixes container to container and host to containers connectivity via ip.

Bridge without overlay fabric might look like this:

The bridge can be a member of a CIDR block shared by all cluster members, for now let's say a /16 address space. This CIDR block is transparent without NAT to all other cluster members including all of the member's containers.

         bridge0
            |
          bond0
        /     \
      eth0     eth1

In cloudinit format, using 10.10.0.0/16 with a gateway of 10.10.0.1 and this host assigned to 10.10.0.2/24, and docker bridge configured with 10.10.2.0/24 the configuration might look like the following:

#cloud-config

coreos:
  units:

    - name: 10.static.netdev
      command: start
      content: |
        [NetDev]
        Name=bridge0
        Kind=bridge

    - name: 20.static.network
      command: start
      content: |
        [Match]
        Name=bridge0

        [Network]
        Address=10.10.0.2/16
        DNS=...
        DNS=...
        Gateway=...
        IPForward=yes

    - name: 50.static.network
      command: start
      content: |
        [Match]
        Name=eth0

        [Network]
        Bridge=bridge0

    - name: 60.static.network
      command: start
      content: |
        [Match]
        Name=eth1

        [Network]
        Bond=bond0

Then configure docker to use this same subnet:

    - name: docker.service
      command: start
      enable: true
      content: |
        [Unit]
        After=docker.socket
        Description=Docker Application Container Engine
        Documentation=http://docs.docker.io

        [Service]
        Restart=always
        Environment="DOCKER_OPT_BIP=-b=bridge0"
        Environment="DOCKER_OPT_MTU="
        Environment="DOCKER_OPT_CIDR=--fixed-cidr=10.10.2.1/24"
        Environment="DOCKER_OPTS=--host=unix:///var/run/docker.sock"
        ExecStart=/bin/bash -c "/usr/lib/coreos/dockerd \
                 --daemon \
                 ${DOCKER_OPTS} \
                 ${DOCKER_OPT_BIP} \
                 ${DOCKER_OPT_CIDR} \
                 ${DOCKER_OPT_MTU} \
                 ${DOCKER_OPT_IPMASQ} \
                 "
        [Install]
        WantedBy=multi-user.target
davidwalter0 commented 9 years ago

A sample implementation without auto discovery forcing the service address to match the port .88:8888 using a trivial forwarder and flannel network overlay.

default sfs-svc k8s-app=sfs-svc k8s-app=sfs-rc 172.24.254.88 8888/TCP

With:

kubectl scale --replicas=1 rc/sfs-rc
for (( i=0; i<4; i++ )); do printf "%3d %s\n" "${i}" "$(curl --silent 10.1.12.10:8888|grep host)"; done
  0 <a href="host-coreos-alpha-00">host-coreos-alpha-00</a>
  1 <a href="host-coreos-alpha-00">host-coreos-alpha-00</a>
  2 <a href="host-coreos-alpha-00">host-coreos-alpha-00</a>
  3 <a href="host-coreos-alpha-00">host-coreos-alpha-00</a>

With:

kubectl scale --replicas=3  rc/sfs-rc
for (( i=0; i<3; i++ )); do printf "%3d %s\n" "${i}" "$(curl --silent 10.1.12.10:8888|grep host)"; done
  0 <a href="host-coreos-alpha-03">host-coreos-alpha-03</a>
  1 <a href="host-coreos-alpha-02">host-coreos-alpha-02</a>
  2 <a href="host-coreos-alpha-01">host-coreos-alpha-01</a>

And from a public ip with a running forwarder:

for (( i=0; i<3; i++ )); do printf "%3d %s\n" "${i}" "$(curl --silent 208.90.61.54:8888|grep host)"; done
  0 <a href="host-coreos-alpha-03">host-coreos-alpha-03</a>
  1 <a href="host-coreos-alpha-02">host-coreos-alpha-02</a>
  2 <a href="host-coreos-alpha-01">host-coreos-alpha-01</a>

The systemd service file

# /etc/systemd/system/forward-sfs.service
[Unit]
Description=%N port forward to k8s service on fixed flannel subnet address 172.24.254.88:8888
After=flanneld.service
Requires=flanneld.service

[Service]
# 172.24.254.88 is route-able from the flannel members
# /coreos.com/network/config { "Network": "172.24.0.0/16" }
ExecStart=/var/lib/ecmi/forward 10.1.12.10:8888 172.24.254.88:8888
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

The cluster state after the rescale looks like this:

NAME              LABELS                                   STATUS
coreos-alpha-00   kubernetes.io/hostname=coreos-alpha-00   Ready
coreos-alpha-01   kubernetes.io/hostname=coreos-alpha-01   Ready
coreos-alpha-02   kubernetes.io/hostname=coreos-alpha-02   Ready
coreos-alpha-03   kubernetes.io/hostname=coreos-alpha-03   Ready
NAMESPACE     CONTROLLER   CONTAINER(S)   IMAGE(S)                                SELECTOR                     REPLICAS
default       sfs-rc       sfs            simple-file-server                      k8s-app=sfs-rc,version=v1    3
kube-system   kube-ui-v1   kube-ui        gcr.io/google_containers/kube-ui:v1.1   k8s-app=kube-ui,version=v1   1
NAMESPACE     NAME               READY     STATUS    RESTARTS   AGE       NODE
default       sfs-rc-n4dvj       1/1       Running   0          5m        coreos-alpha-03
default       sfs-rc-o155f       1/1       Running   0          6m        coreos-alpha-01
default       sfs-rc-yvmz1       1/1       Running   0          5m        coreos-alpha-02
kube-system   kube-ui-v1-mlv2r   1/1       Running   0          8m        coreos-alpha-00
NAMESPACE     NAME         LABELS                                                                         SELECTOR          IP(S)           PORT(S)
default       kubernetes   component=apiserver,provider=kubernetes                                        <none>            172.24.254.1    443/TCP
default       sfs-svc      k8s-app=sfs-svc                                                                k8s-app=sfs-rc    172.24.254.88   8888/TCP
kube-system   kube-ui      k8s-app=kube-ui,kubernetes.io/cluster-service=true,kubernetes.io/name=KubeUI   k8s-app=kube-ui   172.24.254.19   80/TCP
NAMESPACE     NAME         ENDPOINTS
default       kubernetes   10.1.12.10:6443
default       sfs-svc      172.24.40.2:8888,172.24.49.5:8888,172.24.93.2:8888
kube-system   kube-ui      172.24.53.5:8080

And a single static endpoint was used in the service:

apiVersion: v1
kind: Service
metadata:
  name: sfs-svc
  labels:
    k8s-app: sfs-svc
spec:
  selector:
    k8s-app: sfs-rc
  ports:
  - port: 8888
  clusterIP: 172.24.254.88

The sfs is a simple file server and there's a file on each host in the directory served with the name of the host prefaced by host, e.g. host-coreos-alpha-01

Edit After clearing the cluster and restarting with dns, the cluster config appears to be working with full dns discovery

core@coreos-alpha-00 ~ $ kubectl exec -it busybox -- cat /etc/resolv.conf 
nameserver 172.24.254.53
nameserver 10.1.12.1
search default.svc.k8s.local svc.k8s.local k8s.local novalocal

core@coreos-alpha-00 ~ $ kubectl exec -it busybox -- nslookup sfs-svc
Server:    172.24.254.53
Address 1: 172.24.254.53

Name:      sfs-svc
Address 1: 172.24.254.88

Screen grab from UI with partial list of guestbook cluster members screenshot from 2015-09-09 13-54-53

Screenshot of Guestbook . . . screenshot from 2015-09-09 14-17-26

A simplified replication controller using internal load balancing from kube proxy and simple port forwarding

apiVersion: v1
kind: ReplicationController
metadata:
  name: service-sfs-loadbalancer
  labels:
    app: service-sfs-loadbalancer
    version: v1
spec:
  replicas: 1
  selector:
    app: service-sfs-loadbalancer
    version: v1
  template:
    metadata:
      labels:
        app: service-sfs-loadbalancer
        version: v1
    spec:
      nodeSelector:
        role: master
      containers:
      - image: simple-forwarder:latest
        imagePullPolicy: IfNotPresent
        name: simple-forwarder
        ports:
        - containerPort: 8888
          hostPort: 8888
        resources: {}
        privileged: true
        args:
        - "/simple-forwarder"
        - "0.0.0.0:8888"
        - "172.24.254.88:8888"

simple-forwarder build script

#!/bin/bash
version=0.1
dir=$(dirname $(readlink -f ${0}))
cd ${dir}
cat > Dockerfile <<EOF
FROM centos:latest
COPY simple-forwarder /simple-forwarder
CMD [ "/simple-forwarder" ]
EOF

if docker build --force-rm --rm --tag=simple-forwarder . ; then
   docker tag simple-forwarder:latest simple-forwarder:${version}
fi

simple-forwarder.go proof of concept

// simple-forwarder.go
package main

import (
    "io"
    "log"
    "net"
    "os"
)

func forward(connection net.Conn) {
    client, err := net.Dial("tcp", os.Args[2])
    if err != nil {
        log.Fatalf("Connection failed: %v", err)
    }
    log.Printf("Connected to localhost %v %v\n", connection.LocalAddr(), connection.RemoteAddr() )
    go func() {
        defer client.Close()
        defer connection.Close()
        io.Copy(client, connection)
    }()

    go func() {
        defer client.Close()
        defer connection.Close()
        io.Copy(connection, client)
    }()
}

func main() {
    if len(os.Args) != 3 {
        log.Fatalf("Usage %s frontend-ip:port backend-ip:port\n", os.Args[0]);
        return
    }    

    listener, err := net.Listen("tcp", os.Args[1])
    if err != nil {
        log.Fatalf("net.Listen(\"tcp\", %s ) failed: %v", os.Args[1], err )
    }

    for {
        connection, err := listener.Accept()
        if err != nil {
            log.Fatalf("ERROR: failed to accept listener: %v", err)
        }
        log.Printf("Accepted connection %v %v\n", connection.LocalAddr(), connection.RemoteAddr() )
        go forward(connection)
    }
}
davidwalter0 commented 9 years ago

guestbook front end

The prior example with sfs uses a fixed ip address and hardcoded ip from the service pool to map the public address [ which is assigned to the master node for this example ] to the backend guestbook.

The mapping of an ip:port pair is dynamically managed via environment variables.

This introduces k8s a creation sequence dependency. The environment variable for the GUESTBOOK_PORT_3000_TCP_ADDR and GUESTBOOK_SERVICE_PORT aren't available until after the object is created.

kubectl create -f guestbook-fe.yaml

# guestbook-fe.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  name: guestbook-fe
  labels:
    app: guestbook-fe
    version: v1
spec:
  replicas: 1
  selector:
    app: guestbook-fe
    version: v1
  template:
    metadata:
      labels:
        app: guestbook-fe
        version: v1
    spec:
      nodeSelector:
        role: master
      containers:
      - image: simple-forwarder:latest
        imagePullPolicy: IfNotPresent
        name: simple-forwarder
        ports:
        - containerPort: 3000
          hostPort: 3000
        resources: {}
        privileged: true
        args:
        - "/bin/bash"
        - "-c"
        - "/simple-forwarder 0.0.0.0:3000 ${GUESTBOOK_PORT_3000_TCP_ADDR}:${GUESTBOOK_SERVICE_PORT}"

# local variables:
# comment-start: "# "
# mode: shell-script
# end:
kubectl exec busybox -- nslookup guestbook.default
Server:    172.24.254.53
Address 1: 172.24.254.53

Name:      guestbook.default
Address 1: 172.24.254.30

Alternatively with kube-dns to remove the sequence dependency from the process, the args section could be replaced by the dns reference to the service.

        args:      
        - "/simple-forwarder"
        - "0.0.0.0:3000"
        - "guestbook:3000"

Alex pointed out the proxy method from google in the kubernetes repo.

The gcr container uses socat, it appears from the log

kubectl logs guestbook-fe-pxy-svc-037q1
Running socat  TCP-LISTEN:3000,reuseaddr,fork TCP:guestbook.default:3000

Kubernetes exampled reverse proxy for dns

The corresponding yaml for replication controller assuming again that the node with role=master has the public ip address creation for the guestbook might look like. Notice that this would depend on kube-dns or similar functionality active on the cluster, because it references the guestbook.default service dns reference.

# based on
# https://github.com/kubernetes/contrib/tree/master/for-demos/proxy-to-service
apiVersion: v1
kind: ReplicationController
metadata:
  name: guestbook-fe-pxy-svc
  labels:
    app: guestbook-fe-pxy-svc
    version: v1
spec:
  replicas: 1
  selector:
    app: guestbook-fe-pxy-svc
    version: v1
  template:
    metadata:
      labels:
        app: guestbook-fe-pxy-svc
        version: v1
    spec:
      nodeSelector:
        role: master
      containers:
      - name: guestboot-fe-pxy-svc-tcp
        image: gcr.io/google_containers/proxy-to-service:v2
        imagePullPolicy: IfNotPresent
        args: [ "tcp", "3000", "guestbook.default" ]
        ports:
        - name: tcp
          protocol: TCP
          containerPort: 3000
          hostPort: 3000