Open sdmodi opened 3 years ago
Here is the system dump cilium-sysdump-20210524-132109.zip
I believe it is working as expected since fe80::605c:c8ff:fe5c:5e2
is considered traffic from world
and as it is not being accepted by the policy is therefore dropped.
(FYI, we're tracking adding support for explicit ICMP policy rules at https://github.com/cilium/cilium/issues/14609.)
(FYI, we're tracking adding support for explicit ICMP policy rules at #14609.)
I was looking for that GH issue but I couldn't find it. Thanks!
curious, why ipv4 works in this case?
Because IPv4 doesn't rely on ICMP for neighbor discovery? We let ARP packets go through.
then should we probably explicitly allow NDP to go through (even not all ICMP) since it is the "counterpart" of arp for ipv6?
Yes, that would make sense to me.
What is the workaround to this problem? Is it to explicitly allow ICMP traffic? In other words, what should customers do to get IPv6 policies to work despite having NDP packets flowing around?
Did you try allowing connections from the link-local address via fromCIDRs
?
No. But this is so difficult from a customer point of view. Will it work, if I allow all fe80 traffic? I am trying this out on GKE, but I am curious to know how users of IPv6 network policy typically do this. Is the expectation that every time you enable network policy with cilium that you have to explicitly allow link local addresses? Or is it that when people use cilium, usually there isn't any NDP traffic on the network.
I am going to allow fe80::/10 using fromCIDRs to see if this works. Thanks.
Is the expectation that every time you enable network policy with cilium that you have to explicitly allow link local addresses?
IIUC, we need to fix this by allowing ICMP traffic by default, I don't think there's network policy that controls ICMP traffic anyways?
What @pchaigno suggested (explicitly allowing link-local addresses) can be an immediate workaround for this b4 the fix is in?
I have just tried the workaround on our setup (baremetal setup on-prem) and it partially worked for me. What I am seeing is another problem (but it is the last one) -- https://github.com/cilium/cilium/blob/master/bpf/lib/icmp6.h#L397
xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, identity 0->0: fe80::d863:1aff:fee8:e84b -> fe80::34f8:21ff:fe1f:22b9 NeighborSolicitation
If I let this be forwarded out of the container all works #L397 -> return 0;
Interested to see what results @sdmodi gets...?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
I haven't got an opportunity to test this as my test setup ran into some other dependencies. However, this is definitely an issue we need to fix and keep open.
Allowing fe80::/10 in the policy did not work for me.
Allowing fe80 does not work (or maybe I am doing something wrong). I have the following two network policies:
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl get networkpolicy
NAME POD-SELECTOR AGE
allow-access-1 name=nginx 6m54s
allow-link-local name=nginx 26s
Definitions are as follows:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-access-1
spec:
podSelector:
matchLabels:
name: nginx
ingress:
- from:
- ipBlock:
cidr: fe80::/10
- podSelector:
matchLabels:
app: client-allow
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-link-local
spec:
podSelector:
matchLabels:
name: nginx
ingress:
- from:
- ipBlock:
cidr: fe80::/10
In the allow-access-1 I have tried adding fe80::/10 as a whitelisted IP. I have explicitly added another network policy to allow fe80::/10 without any other selectors.
I have a dual stack service:
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl get service nginx-dual -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress":true}'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"nginx-dual","namespace":"default"},"spec":{"ipFamilies":["IPv4","IPv6"],"ports":[{"port":8080,"protocol":"TCP","targetPort":80}],"selector":{"name":"nginx"},"type":"ClusterIP"}}
creationTimestamp: "2021-09-22T18:45:23Z"
name: nginx-dual
namespace: default
resourceVersion: "1974"
uid: 2609988f-b86f-4396-a6ed-c8c0fbd8b608
spec:
clusterIP: 10.153.142.240
clusterIPs:
- 10.153.142.240
- 2600:2d00:0:4:7e9c:941d:4ddd:727c
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
- IPv6
ipFamilyPolicy: RequireDualStack
ports:
- port: 8080
protocol: TCP
targetPort: 80
selector:
name: nginx
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
From the pod that is allowed traffic, I see the IPv4 service as accessible. The IPv6 service is denied:
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl exec -it client-allow-64c5464587-pxvbm -- /bin/bash
cnb@client-allow-64c5464587-pxvbm:/$ curl 10.153.142.240:8080
<!DOCTYPE html>
<html>
<head>
<title>Kubernetes IPv6 nginx</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx on <span style="color: #C70039">IPv6</span> Kubernetes!</h1>
<p>Pod: nginx-controller-n2dj2</p>
</body>
</html>
cnb@client-allow-64c5464587-pxvbm:/$ curl [2600:2d00:0:4:7e9c:941d:4ddd:727c]:8080
^C
Without the network policy the IPv6 service is accessible:
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl delete networkpolicy allow-link-local
networkpolicy.networking.k8s.io "allow-link-local" deleted
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl delete networkpolicy allow-access-1
networkpolicy.networking.k8s.io "allow-access-1" deleted
sdmodi@sdmodi-desk:/google/src/cloud/sdmodi/dual-stack-with-ccm/google3$ kubectl exec -it client-allow-64c5464587-pxvbm -- /bin/bash
cnb@client-allow-64c5464587-pxvbm:/$ curl [2600:2d00:0:4:7e9c:941d:4ddd:727c]:8080
<!DOCTYPE html>
<html>
<head>
<title>Kubernetes IPv6 nginx</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx on <span style="color: #C70039">IPv6</span> Kubernetes!</h1>
<p>Pod: nginx-controller-q2fqj</p>
</body>
</html>
cnb@client-allow-64c5464587-pxvbm:/$
We are running an EKS cluster with Cilium in chain mode with policy enforcement mode always
and IPv6-only. We are experiencing the same issue. Some pods in the cluster have timeout issues:
error retrieving resource lock kube-system/cert-manager-controller: Get "https://[fda2:400f:d07d::1]:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cert-manager-controller": dial tcp [fda2:400f:d07d::1]:443: i/o timeout
Readiness probe failed: Get "http://[2a05:d018:1846:c304:cb52::c]:3000/api/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Hubble does not report any dropped traffic related to kube-apiserver but it shows dropped ingress for ICMPv6 NeighborSolicitation
(see log below). NeighborSolicitation replaces ARP for IPv6 and it is required to resolve network layer addresses to link layer addresses.
Cilium does not allow this traffic by default. But we do have a CiliumClusterwideNetworkPolicy
to allow ingress from link-local addresses (FE80::/64
) and multicast addresses (FF02::2
).
I found that MR https://github.com/cilium/cilium/pull/18522 says to do so but the network policy has no effect.
{"flow":{"time":"2023-06-25T14:46:00.196178739Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"3a:14:c4:fc:ec:57","destination":"33:33:ff:00:00:0b"},"IP":{"source":"fe80::3814:c4ff:fefc:ec57","destination":"ff02::1:ff00:b","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":16777241},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":5},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.196178739Z"}
{"flow":{"time":"2023-06-25T14:46:00.196193397Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"3a:14:c4:fc:ec:57","destination":"33:33:ff:00:00:0b"},"IP":{"source":"fe80::3814:c4ff:fefc:ec57","destination":"ff02::1:ff00:b","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":42339,"labels":["k8s:app.kubernetes.io/component=api","k8s:app.kubernetes.io/managed-by=Helm","k8s:app.kubernetes.io/part-of=backend","k8s:batch.kubernetes.io/controller-uid=58790e18-58e2-4e3b-b265-d33f8f20c017","k8s:controller-uid=58790e18-58e2-4e3b-b265-d33f8f20c017","k8s:io.cilium.k8s.namespace.labels.argocd.argoproj.io/instance=applications","k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=default"]},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":1,"sub_type":133},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.196193397Z"}
{"flow":{"time":"2023-06-25T14:46:00.420187181Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"1e:ec:78:35:63:d8","destination":"33:33:ff:00:00:0f"},"IP":{"source":"fe80::1cec:78ff:fe35:63d8","destination":"ff02::1:ff00:f","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":45711,"labels":["k8s:app.kubernetes.io/component=controller","k8s:app.kubernetes.io/instance=cert-manager","k8s:app.kubernetes.io/managed-by=Helm","k8s:app.kubernetes.io/name=cert-manager","k8s:app.kubernetes.io/version=v1.12.1","k8s:app=cert-manager","k8s:helm.sh/chart=cert-manager-v1.12.1","k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=cert-manager","k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=cert-manager","k8s:io.kubernetes.pod.namespace=cert-manager"]},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":1,"sub_type":133},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.420187181Z"}
{"flow":{"time":"2023-06-25T14:46:00.708220348Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"8a:4e:5f:ca:f0:00","destination":"33:33:ff:00:00:13"},"IP":{"source":"fe80::884e:5fff:feca:f000","destination":"ff02::1:ff00:13","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":16777241},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":5},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.708220348Z"}
{"flow":{"time":"2023-06-25T14:46:00.708233741Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"8a:4e:5f:ca:f0:00","destination":"33:33:ff:00:00:13"},"IP":{"source":"fe80::884e:5fff:feca:f000","destination":"ff02::1:ff00:13","ipVersion":"IPv6"},"l4":{"ICMPv6":{"type":135}},"source":{"identity":16777240},"destination":{"identity":36616,"labels":["k8s:app.kubernetes.io/instance=external-secrets","k8s:app.kubernetes.io/name=external-secrets","k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=external-secrets","k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=external-secret","k8s:io.kubernetes.pod.namespace=external-secrets"]},"Type":"L3_L4","node_name":"ip-10-101-103-62.eu-west-1.compute.internal","event_type":{"type":1,"sub_type":133},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"ICMPv6 NeighborSolicitation"},"node_name":"ip-10-101-103-62.eu-west-1.compute.internal","time":"2023-06-25T14:46:00.708233741Z"}
I am testing a cilium implementation on an infrastructure with OVN (kube-ovn) in IPv6 only I see that the neighbor solicitations (icmp6) are blocked:
kubectl exec -n kube-system cilium-6xrmb -- hubble observe --since 3m --pod default/mariadb-nextcloud-0
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Dec 9 11:14:18.150: default/mariadb-nextcloud-0 (ID:32140) <> ff02::1:ff90:a117 (world) Unknown L3 target address DROPPED (ICMPv6 NeighborSolicitation)
xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, ifindex 61, file icmp6.h:341, , identity unknown->unknown: 2001:X:X:e0ce:604a:f87:0:36 -> ff02::1:ff00:39 NeighborSolicitation
I have no improvement following this link : https://docs.cilium.io/en/stable/security/policy/language/#example-icmp-icmpv6
Did you get it to work with a particular policy ?
I have just tried the workaround on our setup (baremetal setup on-prem) and it partially worked for me. What I am seeing is another problem (but it is the last one) -- https://github.com/cilium/cilium/blob/master/bpf/lib/icmp6.h#L397
xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, identity 0->0: fe80::d863:1aff:fee8:e84b -> fe80::34f8:21ff:fe1f:22b9 NeighborSolicitation
If I let this be forwarded out of the container all works #L397 ->
return 0;
Interested to see what results @sdmodi gets...?
(The line has changed : https://github.com/cilium/cilium/blob/ffbd7af823a35baddbfed9d72ec296bc5f0a12e0/bpf/lib/icmp6.h#L328)
As I understand it, it tests if the destination is the router or if the destination exists in ENDPOINTS_MAP (which seems to be in lxcmap). I don't really see how local addresses (ipv6) could be in the endpoints. Is there a neighbor table somewhere?
I am testing a cilium implementation on an infrastructure with OVN (kube-ovn) in IPv6 only I see that the neighbor solicitations (icmp6) are blocked:
kubectl exec -n kube-system cilium-6xrmb -- hubble observe --since 3m --pod default/mariadb-nextcloud-0 Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init) Dec 9 11:14:18.150: default/mariadb-nextcloud-0 (ID:32140) <> ff02::1:ff90:a117 (world) Unknown L3 target address DROPPED (ICMPv6 NeighborSolicitation)
xx drop (Unknown L3 target address) flow 0x0 to endpoint 0, ifindex 61, file icmp6.h:341, , identity unknown->unknown: 2001:X:X:e0ce:604a:f87:0:36 -> ff02::1:ff00:39 NeighborSolicitation
I have no improvement following this link : https://docs.cilium.io/en/stable/security/policy/language/#example-icmp-icmpv6
Did you get it to work with a particular policy ?
No, I couldn't. At the moment we just put the project with EKS and Cilium/IPv6 on hold. Neighbor solicitation requests should be allowed by default since they replace ARP requests but it isn't the case.
Encountered what really looks like the same issue. For now this workaround seems to be working for us (AKS 1.27, Cilium, Overlay, DualStack). I would not call it the definitive way (nor production-ready), but perhaps a temporary patch to make some progress if people are experimenting.
EDIT: Actually this policy has so many other side-effects on pods that are not explicitly using NetPols, so probably not a good idea after all.
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
name: cilium-ipv6-icmp-workaround
spec:
endpointSelector: {}
ingress:
- icmps:
- fields:
- type: 135 # Neighbor Solicitation
family: IPv6
- type: 136 # Neighbor Advertisement
family: IPv6
egress:
- icmps:
- fields:
- type: 135 # Neighbor Solicitation
family: IPv6
- type: 136 # Neighbor Advertisement
family: IPv6
yes ccnp would not be ideal.. It won't look nice but one way could be added as namespace/pod scoped policy wherever it required. Wondering if there is any alternative solution for this @amitmavgupta @mathpl
This issue seems to be fixed from 1.14.4 atleast and doesn't require explicit icmp allow
I am running a dual stack cluster with the following network policy:
I have a client pod 'client-allow-64c5464587-kszrk' which should be allowed to talk to the nginx service and corresponding backend pods. When I don't apply the networkpolicy everything works. When I do apply the network policy, the IPv6 traffic breaks. Using IPv4 continues to work. When I look at the cilium monitor related to the backend nginx pod I see the following:
Also note that if the backend pod is in the neighbor cache of the client pod, the traffic is permitted.