networkpolicy is not working when service proxy SNAT's traffic

HaveFun83 commented 5 years ago

Hi we use kube-router to advertise the service and pod cidr with bgp Now we want to limit the access the pod via networkpolicy.

example deployment:

NAME            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/nginx   ClusterIP   10.6.173.216   <none>        80/TCP    6d22h   run=nginx

NAME                           READY   STATUS    RESTARTS   AGE     IP            NODE                         NOMINATED NODE   READINESS GATES
pod/busybox-7ffc9fc479-gtkwx   1/1     Running   0          6d22h   10.6.132.17   k8s-worker-2   <none>           <none>
pod/nginx-7cdbd8cdc9-mwtcb     1/1     Running   0          6d22h   10.6.133.16   k8s-worker-3   <none>           <none>

Example policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: access-nginx
  namespace: np-demo
spec:
  podSelector:
    matchLabels:
      run: nginx
  policyTypes:
  - Ingress
  ingress:
  - from:
    - ipBlock:
        cidr: 172.17.88.0/24
    - podSelector:
        matchLabels:
          access: "true"

Default deny

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny-all
  namespace: np-demo
spec:
  podSelector: {}
  policyTypes:
  - Ingress

The service-vip is announced via anycast from all nodes But it only works when a client from 172.17.88.0/24 randomly hit the k8s-worker-3 node which have the nginx pod on it. All other nodes forward the incoming traffic from the nginx service-ip via node-ip towards the nginx pod-ip, which never hit the networkpolicy role as the source ip is completely different.

Maybe someone can give me a hint to resolve this issue?

aauren commented 4 years ago

Was your nginx pod setup with the DSR annotation by any chance?

aauren commented 4 years ago

Closing as stale.

aauren commented 4 years ago

@murali-reddy We recently had a message in slack that brought this issue back up. I'm able to reproduce it myself by applying a network policy to a pod and then bouncing traffic to it through another node.

Here is the setup, context, and tcpdump courtesy of nexus in Slack:

Environment context:

Client Host: 192.168.0.100
Node Without Pod: 192.168.122.250
Node With Pod: 192.168.122.167
Pod IP of Service: 10.122.1.21
Service Port: 8080

Traffic is sent from Client Host to Node Without Pod where the traffic get's SNAT'd through IPVS (default function of IPVS) and sent to Node With Pod. On Node With Pod it gets denied by the network policy since the source address is now the address of Node Without Pod instead of Client Host.

Deployment Setup:

---
apiVersion: v1
kind: Service
metadata:
  name: hello-hello-app
spec:
  type: NodePort
  ports:
    - port: 8080
      nodePort: 32000
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app.kubernetes.io/name: hello-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-hello-app
  labels:
    app.kubernetes.io/name: hello-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: hello-app
  template:
    metadata:
      labels:
        app.kubernetes.io/name: hello-app
    spec:
      nodeSelector:
        kubernetes.io/hostname: k8s-2
      containers:
        - name: hello-app
          image: "nextsux/hello-app:1"
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: "troublemaker"
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: hello-app
  policyTypes:
  - Ingress
  ingress:
  - from:
    - ipBlock:
        cidr: 192.168.0.100/32

tcpdump from Node Without Pod

23:38:57.185836 IP 192.168.0.100.49392 > 192.168.122.250.32000: Flags [S], seq 1938619889, win 64240, options [mss 1460,sackOK,TS val 2486647264 ecr 0,nop,wscale 7], length 0
23:38:57.185906 IP 192.168.122.250.55937 > 10.122.1.21.8080: Flags [S], seq 1938619889, win 64240, options [mss 1460,sackOK,TS val 2486647264 ecr 0,nop,wscale 7], length 0

tcpdump from Node With Pod

23:38:57.221024 IP 192.168.122.250.55937 > 10.122.1.21.8080: Flags [S], seq 1938619889, win 64240, options [mss 1460,sackOK,TS val 2486647264 ecr 0,nop,wscale 7], length 0

I would imagine that this is a common problem for all k8s network frameworks. Do you happen to have any knowledge of how Calico or others address this?

murali-reddy commented 4 years ago

I would imagine that this is a common problem for all k8s network frameworks. Do you happen to have any knowledge of how Calico or others address this?

@aauren Whether its kube-proxy or Kube-router acting as service proxy when external client access the service, to ensure symmetric routing (i.e return traffic goes through same node) traffic is SNAT'ed

please see https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-nodeport Its inherent problem

One can use services with externalTrafficPolicy=local set to retain the source IP to enforce networtk policies. Direct Server Retrun is another option where client IP can be retained and network policies can be enforced.

aauren commented 4 years ago

@murali-reddy that makes sense. Given that it's described in the k8s documentation that this is a pitfall of proxy'd service traffic it seems to me that this is just an accepted problem upstream.

Two things that it would be worth getting your opinion on:

Do you think that there is any place where it would be appropriate to mention this in our documentation with a link to the upstream reference?
If I remember correctly, kube-router already keeps an ipset with all of the node IPs in it. At a logical level it would be pretty easy for kube-router to allow traffic from this ipset via a kube-router annotation on a network policy or service. Since there are no easy methods of referencing nodeIPs in network policy without manually specifying them all, it would be a lot easier for kube-router to perform this work than it would be for user's to do this work for themselves. Would you be against exposing that functionality in this way?

murali-reddy commented 4 years ago

Do you think that there is any place where it would be appropriate to mention this in our documentation with a link to the upstream reference?

Agree. That should be documented.

kube-router already keeps an ipset with all of the node IPs in it. At a logical level it would be pretty easy for kube-router to allow traffic from this ipset via a kube-router annotation on a network policy or service.

I am afraid that would give nodes (e.g. a compramised node) unrestricted access to the pod which is not desirable. In general problem of preserving source IP is not anything specific to Kubernetes. AFAIK there is no one-fit soulution.

In case of Kubernetes setting externalTrafficPolicy=local for all the frontend services (that recieve north-south traffic) seems to be the common practice. For e.g. https://github.com/kubernetes/enhancements/issues/27

nextsux commented 4 years ago

I've just tried with calico and I can confirm calico has the same issue @aauren

cloudnativelabs / kube-router

networkpolicy is not working when service proxy SNAT's traffic #744