NetworkPolicy ipBlock cannot set the real client IP address

yuchunyun commented 2 years ago

Problem: NetworkPolicy ipBlock set real client IP address not work. I deployed the whoami service and set externalIPs for it. access the externalIPs from outside the k8s cluster,, pod log appear as follows:

Hostname: whoami-586fd9cddd-x7wjc
IP: 127.0.0.1
IP: 10.244.6.196
IP: 192.168.120.37
RemoteAddr: 192.168.110.15:48386   #This is my k8s node address
GET / HTTP/1.1
Host: 192.168.120.37   #This is my externalIP
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36 Edg/96.0.1054.34
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cache-Control: max-age=0
Connection: keep-alive
Upgrade-Insecure-Requests: 1

set kube-router.io/service.dsr: tunnel, it will appear as follows.

Hostname: whoami-586fd9cddd-x7wjc
IP: 127.0.0.1
IP: 10.244.6.196
IP: 192.168.120.37
RemoteAddr: 192.168.83.148:59349   #client real address
GET / HTTP/1.1
Host: 192.168.120.37
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36 Edg/96.0.1054.34
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cache-Control: no-cache
Connection: keep-alive
Pragma: no-cache
Upgrade-Insecure-Requests: 1

here my networkpolicy

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: access-whoami
spec:
  podSelector:
    matchLabels:
      run: whoami
  ingress:
  - from:
     - ipBlock:
        cidr: 192.168.83.0/24     # can not work !!!
     - ipBlock:
        cidr: 192.168.110.0/24     #work.

System Information (please complete the following information):

Kube-Router Version (kube-router --version): v1.3.2
Kubernetes Version (kubectl version) : v1.22.2

aauren commented 2 years ago

Hmm... I'm not sure what you mean by #work and # can not work !!!, the situation works fine for me when I reproduce this locally. Were you expecting the traffic to be blocked and it wasn't? Were you expecting the traffic to be unblocked and it wasn't?

You probably need to include more information in your bug report about what you were expecting to happen and what actually happened. Along with logs and what options you run kube-router with.

All of the fields that are in our issue template are important and in order to actually be able to resolve issues we need all of them. However, the information you provided only hits about 20% of them.

Keep in mind that with DSR you almost always want to combine it with kube-router.io/service.local: "true" or internalTrafficPolicy: Local if the remote IP is of interest to you, otherwise your service may be proxied via another node and you'll end up with another node's IP rather than the real IP of the remote node.

yuchunyun commented 2 years ago

Sorry, my English is poor. kube-router args like this:

...   
     args:
        - --run-router=true
        - --run-firewall=true
        - --run-service-proxy=true
        - --bgp-graceful-restart=true
        - --advertise-external-ip=true
        - --service-external-ip-range=192.168.120.0/24
        - --cluster-asn=64611
        - --peer-router-ips=202.173.8.36,202.173.8.37
        - --peer-router-multihop-ttl=5
        - --peer-router-asns=64600,64600
        - --kubeconfig=/var/lib/kube-router/kubeconfig
        - --metrics-path=/metrics
        - --metrics-port=8080
        - --v=5

my test svc defined as follows

apiVersion: v1
kind: Service
metadata:
  annotations:
    kube-router.io/service.dsr: tunnel
  name: whoami
  namespace: default
spec:
  externalIPs:
  - 192.168.120.37
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: whoami

then client access, and the client IP in the POD log is the real address of my client (it`s great) Then add network policy ingress to allow my client address to access svc like this

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: access-whoami
spec:
  podSelector:
    matchLabels:
      run: whoami
  ingress:
  - from:
    - ipBlock:
        cidr: 192.168.83.0/24    #this is my client address

but my client cannot access it properly

Failed connect to 192.168.120.37:80; Connection timed out

only configure like this, the client can access it properly

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: access-whoami
spec:
  podSelector:
    matchLabels:
      run: whoami
  ingress:
  - from:
    - ipBlock:
        cidr: 192.168.110.0/24  #this is my nodes address

yuchunyun commented 2 years ago

observe the dropped packets by running tcpdump -i nflog:100 -nnnn, found two problems，please help me to see if this is normal

14:43:03.000708 IP 192.168.110.15.19005 > 10.244.6.196.80: Flags [S], seq 4158493724, win 29200, options [mss 1460,sackOK,TS val 659845462 ecr 0,nop,wscale 9], length 0
14:43:22.071803 IP 192.168.110.15 > 10.244.6.195: IP 192.168.83.148.47874 > 192.168.120.37.80: Flags [S], seq 2531701233, win 29200, options [mss 1460,sackOK,TS val 659864542 ecr 0,nop,wscale 9], length 0 (ipip-proto-4)

Q1：Whether I use dsr(Tunnel) mode or not, the source IP in tcpdump is 192.168.110.15，Is this why NetworkPolicy.spec.ingress.from.ipBlock must be set k8s node address，not client real address? Q2：In dsr(Tunnel), NetworkPolicy.spec.ingress.ports cannot be set(have tested it and it's true), as shown in tcpdump, the destination port appears to be different from normal mode?

aauren commented 2 years ago

My best guess at this point, is that your service isn't declared as a local service. If you aren't using a local service there is a chance that your request will ingress one node and be proxied to another node that contains the service pod. When this happens the L3 header is rewritten and the new source IP would be seen as a kubernetes node.

Can you try ensuring that your service is a local service via: internalTrafficPolicy: Local and see if you have the same results?

yuchunyun commented 2 years ago

Yes, I tried to set internalTrafficPolicy: Local for the svc, but still got the same result.

I see this vip 192.168.120.37 declared by all my work nodes on the upper BGP peer, and then ipvs entries for this vip on all my whork nodes, when external requests are loaded to nodes that are not running pods, the source IP would be seen as this node.

I think that when set the internalTrafficPolicy: Local, not all nodes should declare the VIP to their upper BGP peers, but only nodes running pod should declare the VIP and have IPVS entries. Just like Metallb does.

aauren commented 2 years ago

Yes, this is the way that kube-router should be functioning. If you have spec.ExternalTrafficPolicy: Local then you should only see the nodes with an active service pod running advertising the external IP to the upstream BGP peer.

I'm having a hard time understanding what could be going wrong here. I know that multiple users have made heavy use of this feature across the last 2 - 3 years of kube-router versions and there has never been an issue with the BGP announcement functionality that I'm aware of. So I'm wondering what could be going wrong.

Can you show the following?

Your full service definition: kubectl get service -n <namespace> <service_name> -o yaml
Your pod output selected by the same selector: kubectl get pod -n <namespace> -l <same_selector_as_service> -o wide (from your example above it would be something like kubectl get pod -n default -l run=whoami -o wide)
Exec into kube-router on a node (be sure to show the node name) that has your service pod and run: gobgp global rib <external_ip>/32
Exec into kube-router on a node (be sure to show the node name) that does NOT have your service pod and run: gobgp global rib <external_ip>/32
It may also be helpful to show the RIB on your upstream router as well

yuchunyun commented 2 years ago

service definition: kubectl get svc whoami -o yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"whoami","namespace":"default"},"spec":{"externalIPs":["192.168.120.37"],"internalTrafficPolicy":"Local","ports":[{"port":80,"protocol":"TCP","targetPort":80}],"selector":{"run":"whoami"}}}
  creationTimestamp: "2021-12-09T02:39:44Z"
  name: whoami
  namespace: default
  resourceVersion: "62814793"
  uid: 198259b7-daaf-4095-906c-ec7985302314
spec:
  clusterIP: 10.96.0.205
  clusterIPs:
  - 10.96.0.205
  externalIPs:
  - 192.168.120.37
  internalTrafficPolicy: Local
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: whoami
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

pod selected : kubectl get pods -l run=whoami -o wide

NAME                      READY   STATUS    RESTARTS   AGE    IP            NODE              NOMINATED NODE   READINESS GATES
whoami-586fd9cddd-8hgqq   1/1     Running   0          110m   10.244.7.39   zdns-yxz-k8s-18   <none>           <none>
whoami-586fd9cddd-dfs2z   1/1     Running   0          110m   10.244.7.40   zdns-yxz-k8s-18   <none>           <none>

k8s node : kubectl -n kube-system get pods -o wide|grep kube-route

kube-router-4dfzn                         1/1     Running   0             40h   192.168.110.15   zdns-yxz-k8s-15   <none>           <none>
kube-router-88pqp                         1/1     Running   0             40h   192.168.110.16   zdns-yxz-k8s-16   <none>           <none>
kube-router-bpbh4                         1/1     Running   0             40h   192.168.110.17   zdns-yxz-k8s-17   <none>           <none>
kube-router-g6qdl                         1/1     Running   0             40h   192.168.110.11   zdns-yxz-k8s-11   <none>           <none>
kube-router-ghpp7                         1/1     Running   0             40h   192.168.110.14   zdns-yxz-k8s-14   <none>           <none>
kube-router-qgg7w                         1/1     Running   0             40h   192.168.110.18   zdns-yxz-k8s-18   <none>           <none>
kube-router-rsvp6                         1/1     Running   0             40h   192.168.110.13   zdns-yxz-k8s-13   <none>           <none>
kube-router-x5bt9                         1/1     Running   0             40h   192.168.110.12   zdns-yxz-k8s-12   <none>           <none>

node zdns-yxz-k8s-18 run gobgp:

   Network              Next Hop             AS_PATH              Age        Attrs
*> 192.168.120.37/32    192.168.110.18                            00:02:52   [{Origin: i}]

node zdns-yxz-k8s-15/zdns-yxz-k8s-17 run gobgp:

   Network              Next Hop             AS_PATH              Age        Attrs
*> 192.168.120.37/32    192.168.110.15                            00:00:33   [{Origin: i}]

   Network              Next Hop             AS_PATH              Age        Attrs
*> 192.168.120.37/32    192.168.110.17                            00:00:35   [{Origin: i}]

show the RIB on your upstream router:

inet.0: 54 destinations, 134 routes (54 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.120.37/32  *[BGP/170] 01:55:43, localpref 100, from 192.168.110.18
                      AS path: 64611 I, validation-state: unverified
                      to 192.168.110.14 via vlan.6
                    > to 192.168.110.15 via vlan.6
                      to 192.168.110.16 via vlan.6
                      to 192.168.110.17 via vlan.6
                      to 192.168.110.18 via vlan.6
                    [BGP/170] 01:55:43, localpref 100
                      AS path: 64611 I, validation-state: unverified
                    > to 192.168.110.14 via vlan.6
                    [BGP/170] 01:55:43, localpref 100
                      AS path: 64611 I, validation-state: unverified
                    > to 192.168.110.15 via vlan.6
                    [BGP/170] 01:55:43, localpref 100
                      AS path: 64611 I, validation-state: unverified
                    > to 192.168.110.16 via vlan.6
                    [BGP/170] 01:55:43, localpref 100
                      AS path: 64611 I, validation-state: unverified
                    > to 192.168.110.17 via vlan.6

aauren commented 2 years ago

Can you change internalTrafficPolicy to externalTrafficPolicy? See https://kubernetes.io/docs/concepts/services-networking/service/#external-traffic-policy

kube-router only pays attention to the external policy when deciding how to advertise BGP VIPs

yuchunyun commented 2 years ago

thx, apologize for my carelessness the ideal state is restored. set type: LoadBalancer and externalTrafficPolicy: Local

yuchunyun commented 2 years ago

have one more question

If externalTrafficPolicy is not set,only set kube-router.io/service.dsr: tunnel , NetworkPolicy.spec.ingress.from.ipBlock must be set k8s node address，not client real address?

aauren commented 2 years ago

No worries! I sent you down that path a couple of comments back when I accidentally mixed up the internal and external policy. Sorry for my carelessness as well. 😅

So, just double checking, did this resolve the issue you were experiencing with network policy and obtaining the source ip from inside the pod?

yuchunyun commented 2 years ago

No worries! I sent you down that path a couple of comments back when I accidentally mixed up the internal and external policy. Sorry for my carelessness as well. 😅

So, just double checking, did this resolve the issue you were experiencing with network policy and obtaining the source ip from inside the pod?

set externalTrafficPolicy: Local have resolve the issue. just had another question, as mentioned above

murali-reddy commented 2 years ago

If externalTrafficPolicy is not set,only set kube-router.io/service.dsr: tunnel , NetworkPolicy.spec.ingress.from.ipBlock must be set k8s node address，not client real address?

While this will allow traffic for the services marked with set kube-router.io/service.dsr: tunnel to be whitelisted, but will not be able to enforce policies based on real client IP address. i.e.) any client accessing service marked with set kube-router.io/service.dsr: tunnel will be permitted (as we have added exception to allow traffic from the nodes)

Unfortunately I can not see as a solution that can be recommended. If you want to enforce network policies based on real client IP address your best bet is to use services that are marked externalTrafficPolicy: Local and does not need DSR

This is the issue from kube-router on how DSR is implemented. Traffic is tunneled into the pod, so we miss an opportunity to perform proper network policy enforcement as when its done on the node its done on the encapsulated packet with different IP address

I will add this limitation to current DSR implementation and potentially see for a solution.

yuchunyun commented 2 years ago

@murali-reddy
Thanks for your explanation. Expect kube-router to get better.

tuananh170489 commented 2 years ago

Great! I've been researching about DSR as well, and luckily I founded this issue. So, will I able to use --advertise-external-ip=true without BGP peering?

cloudnativelabs / kube-router

NetworkPolicy ipBlock cannot set the real client IP address #1199