firewalld blocks traffic on flannel network installed via kubeadm

softwareplumber commented 2 months ago

Expected Behavior

After configuring firewall per available docs, traffic will be routed between pods and services

Current Behavior

Kubernetes nodes start with no obvious errors. Subnet allocation seen in flannel pod logs. However launching a test container and attempting to resolve a dns address shows 'no route to host', although the container IP address and subnet appear correct. CoreDNS logs show no errors, coredns pods also seem to have appropriate IP and subnet allocations.

Stopping the firewalld service on the nodes fixes the problem. Activating logging for rejected packets, I saw something unexpected:

Sep 15 09:10:14 kubeadm1.galifrey.net kernel: filter_FWD_public_REJECT: IN=flannel.1 OUT=cni0 MAC=56:91:4a:53:80:9b:7a:76:66:99:cf:da:08:00 SRC=10.244.1.0 DST=10.244.0.2 LEN=69 T>

E.g. the firewall on the node is blocking packets on the flannel network.

Possible Solution

This may be something that's just obvious to someone properly steeped in Kubernetes networking lore. This is the first time I've installed on an EL9 variant (Alma 9.4). The default firewalld back end appears to be nftables, where flannel is still using iptables. I'm wondering if the two need to use the same back end. If this is the case, is it safe to just change the flannel config yaml and reapply it? If this is the case I think the docs could use updating for sure. Or maybe I just overlooked something?

Steps to Reproduce (for bugs)

Install a kubernetes control plane node and a single worker node on Alma 9.4 with firewall enabled following recipe here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ and apply firewall rules:

Control plane node:

sudo firewall-cmd --permanent --add-service kube-control-plane
sudo firewall-cmd --permanent --add-service kube-control-plane-secure
sudo firewall-cmd --permanent --add-service kubelet
sudo firewall-cmd --permanent --new-service-from-file=flannel.xml --name=flannel
sudo firewall-cmd --permanent --add-service flannel 
sudo firewall-cmd --permanent --add-masquerade
sudo firewall-cmd --reload

Worker node

sudo firewall-cmd --permanent --add-service kube-worker
sudo firewall-cmd --permanent --add-service etcd-client
sudo firewall-cmd --permanent --new-service-from-file=flannel.xml --name=flannel
sudo firewall-cmd --permanent --add-service flannel 
sudo firewall-cmd --reload

flannel.xml

<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>flannel VLAN</short>
  <description>flannel virtual network for kubernetes</description>
  <port port="8472" protocol="udp"/>
  <port port="8285" protocol="udp"/>
</service>

Your Environment

Flannel version: 0.25.6
Backend used (e.g. vxlan or udp): vxlan
Kubernetes version (if used): 1.30.5
Operating System and version: Alma 9.4

softwareplumber commented 2 months ago

Oh, missed the sudo firewall-cmd --permanent --add-masquerade for the worker node. This was done in IRL just a cut-and-paste issue submitting the issue.

softwareplumber commented 2 months ago

Edited the flannel config map to enable nftables and restarted the daemonset - no change in behavior. Could this be related to https://bugzilla.redhat.com/show_bug.cgi?id=2029211 ?

rbrtbnfgl commented 2 months ago

Clearly you need to enable the DNS traffic for your pods on the firewall.

softwareplumber commented 2 months ago

Well, that's a little like saying 'yeah, fix the problem by fixing the problem.' I've enabled the flannel ports on each node's firewall. However the node firewall doesn't even see the cni0 or the flannel.1 interface. AFAIK the flannel and/or kube-proxy pods and/or the CNI are supposed to take care of writing appropriate rules into iptables/ntfables to handle this. Possibly all I need to do is add the cni0 interface to the public zone so that the default forwarding behavior for firewalld returns more or less to what it was in EL8. I don't know. What I do know is that I can't find a flannnel/firewalld HOWTO anywhere that references any required firewall configuration beyond that listed above.

rbrtbnfgl commented 2 months ago

The DNS isn't managed by Flannel anyway. We don't want to enforce any allow rules on the firewall that can be implemented in different ways by the users and blocks some types of traffic by choice. Flannel uses UDP traffic in case you are using VXLan tunnel. Other ports configured by Kubernetes services aren't manged by Flannel they are only using Flannel network to communicate.

softwareplumber commented 2 months ago

Yes. I know. Kubernetes DNS is a kubernetes service just like any other. If the Kubernetes DNS is blocked then that means the pod network is comprehensively borked. Your contention is that every new service deployed on the cluster should require custom configuration on the node firewall? No,that would be stupid. The firewall should concern itself with traffic passing into/out of the node. The installation instructions for the networking plugin should include information on how to configure that plugin to permit unimpeded traffic between kubernetes pods and services. Based on the notes I took at the time, the firewall configuration above worked perfectly well on RH8. I believe something has changed. Whatever new configuration step is required may be perfectly obvious to your typical full-time K8S admin. But it isn't obvious to me.

rbrtbnfgl commented 2 months ago

There are something related to the firewall on the documentation regarding to be sure that the firewall allow the networking communication between the pods. Exact steps with the commands to do so it depends on the software used to do the firewall we didn't want to be specific on that.

softwareplumber commented 2 months ago

Yes. And the exact words are:

When using udp backend, flannel uses UDP port 8285 for sending encapsulated packets.

When using vxlan backend, kernel uses UDP port 8472 for sending encapsulated packets.

Make sure that your firewall rules allow this traffic for all hosts participating in the overlay network.

Done the above. But also:

Make sure that your firewall rules allow traffic from pod network cidr visit your kubernetes master node.

I guess this could be the secret sauce, but it is very vague.

So try looking at other sources (like I literally spent all weekend doing)

https://stackoverflow.com/questions/60708270/how-can-i-use-flannel-without-disabing-firewalld-kubernetes

doesn't mention anything other than the steps already taken. There are dozens and dozens of blogs and articles out there that reference setting up a simple kubernetes cluster with flannel. Unfortunately most of them appear to date from the EL8 era. Either all these blogs and articles missed the exact same critical step, or something has changed.

rbrtbnfgl commented 2 months ago

On your case you could:

Disable the firewall
Enable the traffic to the ServiceCIDR and ClusterCIDR

As I understand you want the firewall so you could add the IPs of the pods to the whitelist

firewall-cmd --permanent --zone=public --add-source=10.244.0.0/16

This is an example command using the default CIDR used by Flannel I don't know if you configured something different. But you could have issue if you want to expose service to the external.

flannel-io / flannel