sw_l3_l4_l7_policy.yaml still crashes

drpaneas commented 6 years ago

I was following the very nice and interesting read: (https://cilium.readthedocs.io/en/v1.2/gettingstarted/gsg_starwars/) but instead of minikube I am trying to test cilium in SUSE CaaSP. Everything seems to be fine so far, apart from the last part which is related to L7.

Expected Behavior:

$ kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
Access denied

Actual Behavior

admin:~ # kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
Panic: deathstar exploded

goroutine 1 [running]:
main.HandleGarbage(0x2080c3f50, 0x2, 0x4, 0x425c0, 0x5, 0xa)
        /code/src/github.com/empire/deathstar/
        temp/main.go:9 +0x64
main.main()
        /code/src/github.com/empire/deathstar/
        temp/main.go:5 +0x85

Debugging information below:

My cluster

In your example, you are using minikube which is a single node cluster. I am using a multi-node cluster: 1 master and 2 workers.

admin:~ # kubectl get nodes
NAME       STATUS    ROLES     AGE       VERSION
master-0   Ready     master    4h        v1.10.7
worker-0   Ready     <none>    4h        v1.10.7
worker-1   Ready     <none>    4h        v1.10.7

Deploy the Demo Application

admin:~ # kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.2/examples/minikube/http-sw-app.yaml
service "deathstar" created
deployment.extensions "deathstar" created
pod "tiefighter" created
pod "xwing" created

Each pod will go through several states until it reaches Running at which point the pod is ready.

admin:~ # kubectl get pods,svc
NAME                             READY     STATUS    RESTARTS   AGE
pod/deathstar-5fc7c7795d-hrxmm   1/1       Running   0          17s
pod/deathstar-5fc7c7795d-t8s7g   1/1       Running   0          17s
pod/tiefighter                   1/1       Running   0          17s
pod/xwing                        1/1       Running   0          17s

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/deathstar    ClusterIP   172.24.179.180   <none>        80/TCP    18s
service/kubernetes   ClusterIP   172.24.0.1       <none>        443/TCP   2h

Each pod will be represented in Cilium as an Endpoint.

admin:~ # kubectl -n kube-system get pods -l k8s-app=cilium
NAME           READY     STATUS    RESTARTS   AGE
cilium-9lggm   1/1       Running   1          2h
cilium-tkkbd   1/1       Running   1          2h
cilium-x5t87   1/1       Running   1          2h

Since I have 3 nodes (1 master, 2 workers) I guess it's only normal that there are 3 cilium pods. Please let me know if this setup is not expected. We can invoke the cilium tool inside the Cilium pods to list them, so I had to make an array in bash that includes all of my cilium pods. Otherwise, if I use one pod I get different results (meaning that deathstar or xwing one time, only deathstar next time, or nothing, etc) every time.

ciliumapp=(cilium-9lggm cilium-tkkbd cilium-x5t87)
admin:~ # for i in "${ciliumapp[@]}"; do kubectl -n kube-system exec $i -- cilium endpoint list | grep -B 1 class; done
15576      Disabled           Disabled          12409      container:org.openbuildservice.disturl='obs://build.suse.de/Devel:CASP:3.0:ControllerNode/images_container_base/db64c16e1f7904d9baf9a0b431996e0e-sles12sp3-kubernetes-node-image-pause'   f00d::ac10:400:0:3cd8   172.16.5.188   ready
                                                           k8s:class=xwing
--
53483      Disabled           Disabled          60106      container:org.openbuildservice.disturl='obs://build.suse.de/Devel:CASP:3.0:ControllerNode/images_container_base/db64c16e1f7904d9baf9a0b431996e0e-sles12sp3-kubernetes-node-image-pause'   f00d::ac10:400:0:d0eb   172.16.4.174   ready
                                                           k8s:class=deathstar
34332      Disabled           Disabled          5982       container:org.openbuildservice.disturl='obs://build.suse.de/Devel:CASP:3.0:ControllerNode/images_container_base/db64c16e1f7904d9baf9a0b431996e0e-sles12sp3-kubernetes-node-image-pause'   f00d::ac10:200:0:861c   172.16.3.79    ready
                                                           k8s:class=tiefighter
--
65111      Disabled           Disabled          60106      container:org.openbuildservice.disturl='obs://build.suse.de/Devel:CASP:3.0:ControllerNode/images_container_base/db64c16e1f7904d9baf9a0b431996e0e-sles12sp3-kubernetes-node-image-pause'   f00d::ac10:200:0:fe57   172.16.3.110   ready

Apply an L3/L4 Policy

admin:~ # kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.2/examples/minikube/sw_l3_l4_policy.yaml
ciliumnetworkpolicy.cilium.io "rule1" created

kubectl get cnp
NAME      AGE
rule1     16s

admin:~ # kubectl describe cnp rule1
Name:         rule1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  cilium.io/v2
Kind:         CiliumNetworkPolicy
Metadata:
  Cluster Name:
  Creation Timestamp:  2018-10-16T15:31:38Z
  Generation:          1
  Resource Version:    39910
  Self Link:           /apis/cilium.io/v2/namespaces/default/ciliumnetworkpolicies/rule1
  UID:                 9286dcf7-d158-11e8-8cc2-46a5b4e9ee65
Spec:
  Endpoint Selector:
    Match Labels:
      Any : Class:  deathstar
      Any : Org:    empire
  Ingress:
    From Endpoints:
      Match Labels:
        Any : Org:  empire
    To Ports:
      Ports:
        Port:      80
        Protocol:  TCP
Status:
  Nodes:
    Master - 0:
      Enforcing:              true
      Last Updated:           2018-10-16T15:31:38.113296001Z
      Local Policy Revision:  199
      Ok:                     true
    Worker - 0:
      Enforcing:              true
      Last Updated:           2018-10-16T15:31:39.472541253Z
      Local Policy Revision:  200
      Ok:                     true
    Worker - 1:
      Enforcing:              true
      Last Updated:           2018-10-16T15:31:39.436367245Z
      Local Policy Revision:  200
      Ok:                     true
Events:                       <none>

Check Current Access

admin:~ # kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
Ship landed

admin:~ # kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
# Either press CTRL+C or leave it to time-out

If I check the policy again, the ingress policy enforment is now enabled for deathstar:

65111      Enabled            Disabled          60106      container:org.openbuildservice.disturl='obs://build.suse.de/Devel:CASP:3.0:ControllerNode/images_container_base/db64c16e1f7904d9baf9a0b431996e0e-sles12sp3-kubernetes-node-image-pause'   f00d::ac10:200:0:fe57   172.16.3.110   ready
                                                           k8s:class=deathstar
53483      Enabled            Disabled          60106      container:org.openbuildservice.disturl='obs://build.suse.de/Devel:CASP:3.0:ControllerNode/images_container_base/db64c16e1f7904d9baf9a0b431996e0e-sles12sp3-kubernetes-node-image-pause'   f00d::ac10:400:0:d0eb   172.16.4.174   ready
                                                           k8s:class=deathstar

Apply and Test HTTP-aware L7 Policy

admin:~ # kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
Panic: deathstar exploded

goroutine 1 [running]:
main.HandleGarbage(0x2080c3f50, 0x2, 0x4, 0x425c0, 0x5, 0xa)
        /code/src/github.com/empire/deathstar/
        temp/main.go:9 +0x64
main.main()
        /code/src/github.com/empire/deathstar/
        temp/main.go:5 +0x85

The problem with that is that if you have 2 replicas and you 'explode' them twice, then your containers are down:

admin:~ # kubectl get pods
NAME                         READY     STATUS             RESTARTS   AGE
deathstar-5fc7c7795d-hrxmm   0/1       CrashLoopBackOff   3          1h
deathstar-5fc7c7795d-t8s7g   0/1       CrashLoopBackOff   3          1h
tiefighter                   1/1       Running            0          1h
xwing                        1/1       Running            0          1h

The fix would be to limit tiefighter to making only a POST /v1/request-landing API call, but disallowing all other calls (including PUT /v1/exhaust-port).

admin:~ # kubectl delete cnp rule1
ciliumnetworkpolicy.cilium.io "rule1" deleted

admin:~ # kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.2/examples/minikube/sw_l3_l4_l7_policy.yaml
ciliumnetworkpolicy.cilium.io "rule1" created

admin:~ # kubectl describe cnp rule1
Name:         rule1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  cilium.io/v2
Kind:         CiliumNetworkPolicy
Metadata:
  Cluster Name:
  Creation Timestamp:  2018-10-16T16:05:08Z
  Generation:          1
  Resource Version:    48243
  Self Link:           /apis/cilium.io/v2/namespaces/default/ciliumnetworkpolicies/rule1
  UID:                 40c3feb3-d15d-11e8-8cc2-46a5b4e9ee65
Spec:
  Endpoint Selector:
    Match Labels:
      Any : Class:  deathstar
      Any : Org:    empire
  Ingress:
    From Endpoints:
      Match Labels:
        Any : Org:  empire
    To Ports:
      Ports:
        Port:      80
        Protocol:  TCP
      Rules:
        Http:
          Method:  POST
          Path:    /v1/request-landing
Status:
  Nodes:
    Master - 0:
      Enforcing:              true
      Last Updated:           2018-10-16T16:05:08.375071533Z
      Local Policy Revision:  242
      Ok:                     true
    Worker - 0:
      Error:         context deadline exceeded
      Last Updated:  2018-10-16T16:06:41.430593693Z
      Ok:            true
Events:              <none>

admin:~ # kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
Ship landed

admin:~ # kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
Panic: deathstar exploded

goroutine 1 [running]:
main.HandleGarbage(0x2080c3f50, 0x2, 0x4, 0x425c0, 0x5, 0xa)
        /code/src/github.com/empire/deathstar/
        temp/main.go:9 +0x64
main.main()
        /code/src/github.com/empire/deathstar/
        temp/main.go:5 +0x85

Do you know what is going wrong and the L7 rule is still exploding?

drpaneas commented 6 years ago

I've also tried the instructions found in https://github.com/cilium/star-wars-demo/blob/master/README.md. Still the same result:

admin:~/star-wars-demo # kubectl create -f 01-deathstar.yaml -f 02-xwing.yaml
service "deathstar" created
deployment.extensions "deathstar" created
deployment.extensions "spaceship" created
deployment.extensions "xwing" created

admin:~/star-wars-demo # kubectl get pods
NAME                        READY     STATUS    RESTARTS   AGE
deathstar-99f54944f-5zbrj   1/1       Running   0          35s
deathstar-99f54944f-f5pgd   1/1       Running   0          35s
deathstar-99f54944f-lphk9   1/1       Running   0          35s
spaceship-d9f5db749-bt647   1/1       Running   0          35s
spaceship-d9f5db749-hs597   1/1       Running   0          35s
spaceship-d9f5db749-qmsqk   1/1       Running   0          35s
spaceship-d9f5db749-zdxbj   1/1       Running   0          35s
xwing-585b668b8d-nmblb      1/1       Running   0          35s
xwing-585b668b8d-sj8d9      1/1       Running   0          35s
xwing-585b668b8d-xkk7t      1/1       Running   0          35s

admin:~/star-wars-demo # kubectl exec -ti xwing-585b668b8d-nmblb -- curl -XGET deathstar.default.svc.cluster.local/v1/
{
    "name": "Death Star",
    "model": "DS-1 Orbital Battle Station",
    "manufacturer": "Imperial Department of Military Research, Sienar Fleet Systems",
    "cost_in_credits": "1000000000000",
    "length": "120000",
    "crew": "342953",
    "passengers": "843342",
    "cargo_capacity": "1000000000000",
    "hyperdrive_rating": "4.0",
    "starship_class": "Deep Space Mobile Battlestation",
    "api": [
        "GET   /v1",
        "GET   /v1/healthz",
        "POST  /v1/request-landing",
        "PUT   /v1/cargobay",
        "GET   /v1/hyper-matter-reactor/status",
        "PUT   /v1/exhaust-port"
    ]
}

admin:~/star-wars-demo # kubectl create -f policy/l7_policy.yaml
ciliumnetworkpolicy.cilium.io "deathstar-api-protection" created

admin:~/star-wars-demo # kubectl exec -ti xwing-585b668b8d-nmblb -- curl -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
Panic: deathstar exploded

goroutine 1 [running]:
main.HandleGarbage(0x2080c3f50, 0x2, 0x4, 0x425c0, 0x5, 0xa)
        /code/src/github.com/empire/deathstar/
        temp/main.go:9 +0x64
main.main()
        /code/src/github.com/empire/deathstar/
        temp/main.go:5 +0x85

nebril commented 6 years ago

It turned out to be problem with registry.opensuse.org/devel/caasp/kubic-container/container/kubic/cilium:1.2.1 container image (lack of cilium-envoy binary in container).

vadorovsky commented 6 years ago

It seems that this L7 policy is not working, because openSUSE image doesn't contain cilium-envoy. So, basically, we need to enable Envoy support.

@drpaneas If you are curious, we are still struggling with packaging Envoy, here is the discussion with upstream devs which will help us. https://github.com/envoyproxy/envoy/pull/4585

Let's close this issue and figure out that issue internally. Sorry for the noise!

nebril commented 6 years ago

@tgraf please close

cilium / star-wars-demo