Open softwareplumber opened 2 months ago
Oh, missed the sudo firewall-cmd --permanent --add-masquerade for the worker node. This was done in IRL just a cut-and-paste issue submitting the issue.
Edited the flannel config map to enable nftables and restarted the daemonset - no change in behavior. Could this be related to https://bugzilla.redhat.com/show_bug.cgi?id=2029211 ?
Clearly you need to enable the DNS traffic for your pods on the firewall.
Well, that's a little like saying 'yeah, fix the problem by fixing the problem.' I've enabled the flannel ports on each node's firewall. However the node firewall doesn't even see the cni0 or the flannel.1 interface. AFAIK the flannel and/or kube-proxy pods and/or the CNI are supposed to take care of writing appropriate rules into iptables/ntfables to handle this. Possibly all I need to do is add the cni0 interface to the public zone so that the default forwarding behavior for firewalld returns more or less to what it was in EL8. I don't know. What I do know is that I can't find a flannnel/firewalld HOWTO anywhere that references any required firewall configuration beyond that listed above.
The DNS isn't managed by Flannel anyway. We don't want to enforce any allow rules on the firewall that can be implemented in different ways by the users and blocks some types of traffic by choice. Flannel uses UDP traffic in case you are using VXLan tunnel. Other ports configured by Kubernetes services aren't manged by Flannel they are only using Flannel network to communicate.
Yes. I know. Kubernetes DNS is a kubernetes service just like any other. If the Kubernetes DNS is blocked then that means the pod network is comprehensively borked. Your contention is that every new service deployed on the cluster should require custom configuration on the node firewall? No,that would be stupid. The firewall should concern itself with traffic passing into/out of the node. The installation instructions for the networking plugin should include information on how to configure that plugin to permit unimpeded traffic between kubernetes pods and services. Based on the notes I took at the time, the firewall configuration above worked perfectly well on RH8. I believe something has changed. Whatever new configuration step is required may be perfectly obvious to your typical full-time K8S admin. But it isn't obvious to me.
There are something related to the firewall on the documentation regarding to be sure that the firewall allow the networking communication between the pods. Exact steps with the commands to do so it depends on the software used to do the firewall we didn't want to be specific on that.
Yes. And the exact words are:
When using udp backend, flannel uses UDP port 8285 for sending encapsulated packets.
When using vxlan backend, kernel uses UDP port 8472 for sending encapsulated packets.
Make sure that your firewall rules allow this traffic for all hosts participating in the overlay network.
Done the above. But also:
Make sure that your firewall rules allow traffic from pod network cidr visit your kubernetes master node.
I guess this could be the secret sauce, but it is very vague.
So try looking at other sources (like I literally spent all weekend doing)
On your case you could:
As I understand you want the firewall so you could add the IPs of the pods to the whitelist
firewall-cmd --permanent --zone=public --add-source=10.244.0.0/16
This is an example command using the default CIDR used by Flannel I don't know if you configured something different. But you could have issue if you want to expose service to the external.
Expected Behavior
After configuring firewall per available docs, traffic will be routed between pods and services
Current Behavior
Kubernetes nodes start with no obvious errors. Subnet allocation seen in flannel pod logs. However launching a test container and attempting to resolve a dns address shows 'no route to host', although the container IP address and subnet appear correct. CoreDNS logs show no errors, coredns pods also seem to have appropriate IP and subnet allocations.
Stopping the firewalld service on the nodes fixes the problem. Activating logging for rejected packets, I saw something unexpected:
Sep 15 09:10:14 kubeadm1.galifrey.net kernel: filter_FWD_public_REJECT: IN=flannel.1 OUT=cni0 MAC=56:91:4a:53:80:9b:7a:76:66:99:cf:da:08:00 SRC=10.244.1.0 DST=10.244.0.2 LEN=69 T>
E.g. the firewall on the node is blocking packets on the flannel network.
Possible Solution
This may be something that's just obvious to someone properly steeped in Kubernetes networking lore. This is the first time I've installed on an EL9 variant (Alma 9.4). The default firewalld back end appears to be nftables, where flannel is still using iptables. I'm wondering if the two need to use the same back end. If this is the case, is it safe to just change the flannel config yaml and reapply it? If this is the case I think the docs could use updating for sure. Or maybe I just overlooked something?
Steps to Reproduce (for bugs)
Install a kubernetes control plane node and a single worker node on Alma 9.4 with firewall enabled following recipe here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ and apply firewall rules:
Control plane node:
Worker node
flannel.xml
Your Environment