cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
20.33k stars 2.98k forks source link

IPv6 Neighbor solicitation packets dropped when network policy is enabled #16285

Open sdmodi opened 3 years ago

sdmodi commented 3 years ago

I am running a dual stack cluster with the following network policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.k8s.io/v1","kind":"NetworkPolicy","metadata":{"annotations":{},"name":"allow-access-1","namespace":"default"},"spec":{"ingress":[{"from":[{"podSelector":{"matchLabels":{"app":"client-allow"}}}]}],"podSelector":{"matchLabels":{"name":"nginx"}}}}
  creationTimestamp: "2021-05-24T20:16:44Z"
  generation: 1
  name: allow-access-1
  namespace: default
  resourceVersion: "17776"
  uid: 1f00fa80-5016-4640-ba0b-3a861bd2b2aa
spec:
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: client-allow
  podSelector:
    matchLabels:
      name: nginx
  policyTypes:
  - Ingress

I have a client pod 'client-allow-64c5464587-kszrk' which should be allowed to talk to the nginx service and corresponding backend pods. When I don't apply the networkpolicy everything works. When I do apply the network policy, the IPv6 traffic breaks. Using IPv4 continues to work. When I look at the cilium monitor related to the backend nginx pod I see the following:

root@gke-cos-dpv2-ex-5-default-pool-3b578bb7-9jsc:/home/cilium# cilium monitor --related-to 163
Listening for events on 2 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
Policy verdict log: flow 0x0 local EP ID 163, remote ID 2, proto 58, ingress, action deny, match none, fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
xx drop (Policy denied) flow 0x0 to endpoint 163, identity 2->5031: fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
Policy verdict log: flow 0x0 local EP ID 163, remote ID 2, proto 58, ingress, action deny, match none, fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
xx drop (Policy denied) flow 0x0 to endpoint 163, identity 2->5031: fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
Policy verdict log: flow 0x0 local EP ID 163, remote ID 2, proto 58, ingress, action deny, match none, fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
xx drop (Policy denied) flow 0x0 to endpoint 163, identity 2->5031: fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
Policy verdict log: flow 0x0 local EP ID 163, remote ID 2, proto 58, ingress, action deny, match none, fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
xx drop (Policy denied) flow 0x0 to endpoint 163, identity 2->5031: fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
Policy verdict log: flow 0x0 local EP ID 163, remote ID 2, proto 58, ingress, action deny, match none, fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
xx drop (Policy denied) flow 0x0 to endpoint 163, identity 2->5031: fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
Policy verdict log: flow 0x0 local EP ID 163, remote ID 2, proto 58, ingress, action deny, match none, fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation
xx drop (Policy denied) flow 0x0 to endpoint 163, identity 2->5031: fe80::605c:c8ff:fe5c:5e2 -> ff02::1:ff00:11 NeighborSolicitation

Also note that if the backend pod is in the neighbor cache of the client pod, the traffic is permitted.

sdmodi commented 3 years ago

Here is the system dump cilium-sysdump-20210524-132109.zip

aanm commented 3 years ago

I believe it is working as expected since fe80::605c:c8ff:fe5c:5e2 is considered traffic from world and as it is not being accepted by the policy is therefore dropped.

pchaigno commented 3 years ago

(FYI, we're tracking adding support for explicit ICMP policy rules at https://github.com/cilium/cilium/issues/14609.)

aanm commented 3 years ago

(FYI, we're tracking adding support for explicit ICMP policy rules at #14609.)

I was looking for that GH issue but I couldn't find it. Thanks!

Weil0ng commented 3 years ago

curious, why ipv4 works in this case?

pchaigno commented 3 years ago

Because IPv4 doesn't rely on ICMP for neighbor discovery? We let ARP packets go through.

Weil0ng commented 3 years ago

then should we probably explicitly allow NDP to go through (even not all ICMP) since it is the "counterpart" of arp for ipv6?

pchaigno commented 3 years ago

Yes, that would make sense to me.

sdmodi commented 3 years ago

What is the workaround to this problem? Is it to explicitly allow ICMP traffic? In other words, what should customers do to get IPv6 policies to work despite having NDP packets flowing around?

pchaigno commented 3 years ago

Did you try allowing connections from the link-local address via fromCIDRs?

sdmodi commented 3 years ago

No. But this is so difficult from a customer point of view. Will it work, if I allow all fe80 traffic? I am trying this out on GKE, but I am curious to know how users of IPv6 network policy typically do this. Is the expectation that every time you enable network policy with cilium that you have to explicitly allow link local addresses? Or is it that when people use cilium, usually there isn't any NDP traffic on the network.

I am going to allow fe80::/10 using fromCIDRs to see if this works. Thanks.

Weil0ng commented 3 years ago

Is the expectation that every time you enable network policy with cilium that you have to explicitly allow link local addresses?

IIUC, we need to fix this by allowing ICMP traffic by default, I don't think there's network policy that controls ICMP traffic anyways?

What @pchaigno suggested (explicitly allowing link-local addresses) can be an immediate workaround for this b4 the fix is in?

oblazek commented 3 years ago

I have just tried the workaround on our setup (baremetal setup on-prem) and it partially worked for me. What I am seeing is another problem (but it is the last one) -- https://github.com/cilium/cilium/blob/master/bpf/lib/icmp6.h#L397

xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, identity 0->0: fe80::d863:1aff:fee8:e84b -> fe80::34f8:21ff:fe1f:22b9 NeighborSolicitation

If I let this be forwarded out of the container all works #L397 -> return 0;

Interested to see what results @sdmodi gets...?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

sdmodi commented 3 years ago

I haven't got an opportunity to test this as my test setup ran into some other dependencies. However, this is definitely an issue we need to fix and keep open.

sdmodi commented 3 years ago

Allowing fe80::/10 in the policy did not work for me.

Allowing fe80 does not work (or maybe I am doing something wrong). I have the following two network policies:

sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl get networkpolicy
NAME               POD-SELECTOR   AGE
allow-access-1     name=nginx     6m54s
allow-link-local   name=nginx     26s
Definitions are as follows:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-access-1
spec:
  podSelector:
    matchLabels:
      name: nginx
  ingress:
    - from:
      - ipBlock:
          cidr: fe80::/10
      - podSelector:
          matchLabels:
            app: client-allow
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-link-local
spec:
  podSelector:
    matchLabels:
      name: nginx
  ingress:
    - from:
      - ipBlock:
          cidr: fe80::/10

In the allow-access-1 I have tried adding fe80::/10 as a whitelisted IP. I have explicitly added another network policy to allow fe80::/10 without any other selectors.

I have a dual stack service:

sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl get service nginx-dual -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    cloud.google.com/neg: '{"ingress":true}'
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"nginx-dual","namespace":"default"},"spec":{"ipFamilies":["IPv4","IPv6"],"ports":[{"port":8080,"protocol":"TCP","targetPort":80}],"selector":{"name":"nginx"},"type":"ClusterIP"}}
  creationTimestamp: "2021-09-22T18:45:23Z"
  name: nginx-dual
  namespace: default
  resourceVersion: "1974"
  uid: 2609988f-b86f-4396-a6ed-c8c0fbd8b608
spec:
  clusterIP: 10.153.142.240
  clusterIPs:
  - 10.153.142.240
  - 2600:2d00:0:4:7e9c:941d:4ddd:727c
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - port: 8080
    protocol: TCP
    targetPort: 80
  selector:
    name: nginx
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

From the pod that is allowed traffic, I see the IPv4 service as accessible. The IPv6 service is denied:

sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl exec -it client-allow-64c5464587-pxvbm -- /bin/bash
cnb@client-allow-64c5464587-pxvbm:/$ curl 10.153.142.240:8080
<!DOCTYPE html>
<html>
<head>
<title>Kubernetes IPv6 nginx</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx on <span style="color:  #C70039">IPv6</span> Kubernetes!</h1>
<p>Pod: nginx-controller-n2dj2</p>
</body>
</html>
cnb@client-allow-64c5464587-pxvbm:/$ curl [2600:2d00:0:4:7e9c:941d:4ddd:727c]:8080
^C

Without the network policy the IPv6 service is accessible:

sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl delete networkpolicy allow-link-local
networkpolicy.networking.k8s.io "allow-link-local" deleted
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl delete networkpolicy allow-access-1
networkpolicy.networking.k8s.io "allow-access-1" deleted
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl exec -it client-allow-64c5464587-pxvbm -- /bin/bash
cnb@client-allow-64c5464587-pxvbm:/$ curl [2600:2d00:0:4:7e9c:941d:4ddd:727c]:8080
<!DOCTYPE html>
<html>
<head>
<title>Kubernetes IPv6 nginx</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx on <span style="color:  #C70039">IPv6</span> Kubernetes!</h1>
<p>Pod: nginx-controller-q2fqj</p>
</body>
</html>
cnb@client-allow-64c5464587-pxvbm:/$ 
netcelli-tux commented 1 year ago

We are running an EKS cluster with Cilium in chain mode with policy enforcement mode always and IPv6-only. We are experiencing the same issue. Some pods in the cluster have timeout issues:

error retrieving resource lock kube-system/cert-manager-controller: Get "https://[fda2:400f:d07d::1]:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cert-manager-controller": dial tcp [fda2:400f:d07d::1]:443: i/o timeout
Readiness probe failed: Get "http://[2a05:d018:1846:c304:cb52::c]:3000/api/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Hubble does not report any dropped traffic related to kube-apiserver but it shows dropped ingress for ICMPv6 NeighborSolicitation (see log below). NeighborSolicitation replaces ARP for IPv6 and it is required to resolve network layer addresses to link layer addresses. Cilium does not allow this traffic by default. But we do have a CiliumClusterwideNetworkPolicy to allow ingress from link-local addresses (FE80::/64) and multicast addresses (FF02::2). I found that MR https://github.com/cilium/cilium/pull/18522 says to do so but the network policy has no effect.

{"flow":{"time":"2023-06-25T14:46:00.196178739Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"3a:14:c4:fc:ec:57","destination":"33:33:ff:00:00:0b"},"IP":{"source":"fe80::3814:c4ff:fefc:ec57","destination":"ff02::1:ff00:b","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":16777241},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":5},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.196178739Z"}
{"flow":{"time":"2023-06-25T14:46:00.196193397Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"3a:14:c4:fc:ec:57","destination":"33:33:ff:00:00:0b"},"IP":{"source":"fe80::3814:c4ff:fefc:ec57","destination":"ff02::1:ff00:b","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":42339,"labels":["k8s:app.kubernetes.io/component=api","k8s:app.kubernetes.io/managed-by=Helm","k8s:app.kubernetes.io/part-of=backend","k8s:batch.kubernetes.io/controller-uid=58790e18-58e2-4e3b-b265-d33f8f20c017","k8s:controller-uid=58790e18-58e2-4e3b-b265-d33f8f20c017","k8s:io.cilium.k8s.namespace.labels.argocd.argoproj.io/instance=applications","k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=default"]},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":1,"sub_type":133},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.196193397Z"}
{"flow":{"time":"2023-06-25T14:46:00.420187181Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"1e:ec:78:35:63:d8","destination":"33:33:ff:00:00:0f"},"IP":{"source":"fe80::1cec:78ff:fe35:63d8","destination":"ff02::1:ff00:f","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":45711,"labels":["k8s:app.kubernetes.io/component=controller","k8s:app.kubernetes.io/instance=cert-manager","k8s:app.kubernetes.io/managed-by=Helm","k8s:app.kubernetes.io/name=cert-manager","k8s:app.kubernetes.io/version=v1.12.1","k8s:app=cert-manager","k8s:helm.sh/chart=cert-manager-v1.12.1","k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=cert-manager","k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=cert-manager","k8s:io.kubernetes.pod.namespace=cert-manager"]},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":1,"sub_type":133},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.420187181Z"}
{"flow":{"time":"2023-06-25T14:46:00.708220348Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"8a:4e:5f:ca:f0:00","destination":"33:33:ff:00:00:13"},"IP":{"source":"fe80::884e:5fff:feca:f000","destination":"ff02::1:ff00:13","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":16777241},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":5},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.708220348Z"}
{"flow":{"time":"2023-06-25T14:46:00.708233741Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"8a:4e:5f:ca:f0:00","destination":"33:33:ff:00:00:13"},"IP":{"source":"fe80::884e:5fff:feca:f000","destination":"ff02::1:ff00:13","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":36616,"labels":["k8s:app.kubernetes.io/instance=external-secrets","k8s:app.kubernetes.io/name=external-secrets","k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=external-secrets","k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=external-secret","k8s:io.kubernetes.pod.namespace=external-secrets"]},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":1,"sub_type":133},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.708233741Z"}
dcasier commented 11 months ago

I am testing a cilium implementation on an infrastructure with OVN (kube-ovn) in IPv6 only I see that the neighbor solicitations (icmp6) are blocked:

kubectl exec -n kube-system cilium-6xrmb -- hubble observe --since 3m --pod default/mariadb-nextcloud-0
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Dec  9 11:14:18.150: default/mariadb-nextcloud-0 (ID:32140) <> ff02::1:ff90:a117 (world) Unknown L3 target address DROPPED (ICMPv6 NeighborSolicitation)
xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, ifindex 61, file icmp6.h:341, , identity unknown->unknown: 2001:X:X:e0ce:604a:f87:0:36 -> ff02::1:ff00:39 NeighborSolicitation

I have no improvement following this link : https://docs.cilium.io/en/stable/security/policy/language/#example-icmp-icmpv6

Did you get it to work with a particular policy ?

dcasier commented 11 months ago

I have just tried the workaround on our setup (baremetal setup on-prem) and it partially worked for me. What I am seeing is another problem (but it is the last one) -- https://github.com/cilium/cilium/blob/master/bpf/lib/icmp6.h#L397

xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, identity 0->0: fe80::d863:1aff:fee8:e84b -> fe80::34f8:21ff:fe1f:22b9 NeighborSolicitation

If I let this be forwarded out of the container all works #L397 -> return 0;

Interested to see what results @sdmodi gets...?

(The line has changed : https://github.com/cilium/cilium/blob/ffbd7af823a35baddbfed9d72ec296bc5f0a12e0/bpf/lib/icmp6.h#L328)

As I understand it, it tests if the destination is the router or if the destination exists in ENDPOINTS_MAP (which seems to be in lxcmap). I don't really see how local addresses (ipv6) could be in the endpoints. Is there a neighbor table somewhere?

netcelli-tux commented 11 months ago

I am testing a cilium implementation on an infrastructure with OVN (kube-ovn) in IPv6 only I see that the neighbor solicitations (icmp6) are blocked:

kubectl exec -n kube-system cilium-6xrmb -- hubble observe --since 3m --pod default/mariadb-nextcloud-0
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Dec  9 11:14:18.150: default/mariadb-nextcloud-0 (ID:32140) <> ff02::1:ff90:a117 (world) Unknown L3 target address DROPPED (ICMPv6 NeighborSolicitation)
xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, ifindex 61, file icmp6.h:341, , identity unknown->unknown: 2001:X:X:e0ce:604a:f87:0:36 -> ff02::1:ff00:39 NeighborSolicitation

I have no improvement following this link : https://docs.cilium.io/en/stable/security/policy/language/#example-icmp-icmpv6

Did you get it to work with a particular policy ?

No, I couldn't. At the moment we just put the project with EKS and Cilium/IPv6 on hold. Neighbor solicitation requests should be allowed by default since they replace ARP requests but it isn't the case.

lorenzo-biava commented 11 months ago

Encountered what really looks like the same issue. For now this workaround seems to be working for us (AKS 1.27, Cilium, Overlay, DualStack). I would not call it the definitive way (nor production-ready), but perhaps a temporary patch to make some progress if people are experimenting.

EDIT: Actually this policy has so many other side-effects on pods that are not explicitly using NetPols, so probably not a good idea after all.

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: cilium-ipv6-icmp-workaround
spec:
  endpointSelector: {}
  ingress:
  - icmps:
    - fields:
      - type: 135 # Neighbor Solicitation
        family: IPv6
      - type: 136 # Neighbor Advertisement
        family: IPv6
  egress:
  - icmps:
    - fields:
      - type: 135 # Neighbor Solicitation
        family: IPv6
      - type: 136 # Neighbor Advertisement
        family: IPv6
tamilmani1989 commented 11 months ago

yes ccnp would not be ideal.. It won't look nice but one way could be added as namespace/pod scoped policy wherever it required. Wondering if there is any alternative solution for this @amitmavgupta @mathpl

tamilmani1989 commented 11 months ago

This issue seems to be fixed from 1.14.4 atleast and doesn't require explicit icmp allow