flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.74k stars 2.87k forks source link

Flannel v0.24.3 constantly recreates IPtables rules on all nodes #1913

Closed d-dimitrov-georgiev closed 6 months ago

d-dimitrov-georgiev commented 6 months ago

When provisioning cluster with kubeadm (kubernetes v1.27.11) and flannel v0.24.3, the logs on all flannel pods have the following:

E0318 09:30:12.024155       1 iptables.go:427] Failed to bootstrap IPTables: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4
I0318 09:30:12.036865       1 iptables.go:503] Some iptables rules are missing; deleting and recreating rules
E0318 09:30:12.084574       1 iptables.go:440] Failed to ensure iptables rules: error setting up rules: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4

This is not presented when using flannel v0.24.2

Expected Behavior

Not to recreate the iptables rules every 5 seconds.

Current Behavior

Flannel deletes and recreates the iptables rules every 5 seconds:

I0318 09:47:50.417361       1 iptables.go:503] Some iptables rules are missing; deleting and recreating rules
I0318 09:47:50.453148       1 iptables.go:358] trying to run iptables-restore < map[nat:[[-A POSTROUTING -m comment --comment flanneld masq -j FLANNEL-POSTRTG] [-A FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG -s ::/0 ! -d ff00::/8 -m comment --comment flanneld masq -j MASQUERADE --random-fully] [-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment flanneld masq -j MASQUERADE --random-fully]]]
I0318 09:47:50.453466       1 iptables_restore.go:86] trying to run with payload *nat
-A POSTROUTING -m comment --comment "flanneld masq" -j FLANNEL-POSTRTG
-A FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s ::/0 ! -d ff00::/8 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
COMMIT
E0318 09:47:50.472052       1 iptables.go:440] Failed to ensure iptables rules: error setting up rules: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4

I0318 09:47:55.481039       1 iptables.go:503] Some iptables rules are missing; deleting and recreating rules
I0318 09:47:55.532094       1 iptables.go:358] trying to run iptables-restore < map[nat:[[-A POSTROUTING -m comment --comment flanneld masq -j FLANNEL-POSTRTG] [-A FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment flanneld masq -j RETURN] [-A FLANNEL-POSTRTG -s ::/0 ! -d ff00::/8 -m comment --comment flanneld masq -j MASQUERADE --random-fully] [-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment flanneld masq -j MASQUERADE --random-fully]]]
I0318 09:47:55.532298       1 iptables_restore.go:86] trying to run with payload *nat
-A POSTROUTING -m comment --comment "flanneld masq" -j FLANNEL-POSTRTG
-A FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s ::/0 ! -d ff00::/8 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
-A FLANNEL-POSTRTG ! -s ::/0 -d ::/0 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
COMMIT
E0318 09:47:55.556468       1 iptables.go:440] Failed to ensure iptables rules: error setting up rules: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4

Steps to Reproduce (for bugs)

  1. Deploy kubernetes 1.27.11 cluster using kubeadm on Debian as described in this gist for example. Really nothing out of the ordinary - kubeadm init and kubeadm join. The cluster has 1 control-plane node and 2 worker nodes.
  2. Deploy kube-flannel - kubectl apply -f https://github.com/flannel-io/flannel/releases/download/v0.24.3/kube-flannel.yml
  3. Wait for the pods to start - while [ $(kubectl get pods -n kube-flannel | grep -v 'Running' | wc -l) -gt 1 ]; do sleep 10; done
  4. Get the logs from any of the pods - kubectl logs $(kubectl get pods -n kube-flannel --no-headers | head -n 1 | awk '{print $1'}) -n kube-flannel

Context

I was unable to find anything "wrong" with the network - pod-to-pod communication works, services work as well, DNS resolution works. I assume this may be putting some pressure on iptables.

Your Environment

* Etcd version: 3.5.7-0
* Kubernetes version (if used): 1.27.11
* Operating System and version:

cat /etc/os-release

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/

iptables --version

iptables v1.8.7 (nf_tables)

lsmod

Module Size Used by xt_statistic 16384 5 veth 32768 0 vxlan 81920 0 ip6_udp_tunnel 16384 1 vxlan udp_tunnel 20480 1 vxlan xt_nat 16384 10 xt_mark 16384 4 ipt_REJECT 16384 0 nf_reject_ipv4 16384 1 ipt_REJECT xt_tcpudp 20480 10 xt_comment 16384 88 xt_conntrack 16384 20 nft_chain_nat 16384 6 xt_MASQUERADE 20480 5 nf_nat 57344 3 xt_nat,nft_chain_nat,xt_MASQUERADE nf_conntrack_netlink 57344 0 nf_conntrack 176128 5 xt_conntrack,nf_nat,xt_nat,nf_conntrack_netlink,xt_MASQUERADE nf_defrag_ipv6 24576 1 nf_conntrack nf_defrag_ipv4 16384 1 nf_conntrack xfrm_user 45056 1 xfrm_algo 16384 1 xfrm_user nft_counter 16384 116 xt_addrtype 16384 4 nft_compat 20480 146 nf_tables 274432 286 nft_compat,nft_counter,nft_chain_nat libcrc32c 16384 3 nf_conntrack,nf_nat,nf_tables nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables dm_mod 163840 0 intel_rapl_msr 20480 0 intel_rapl_common 28672 1 intel_rapl_msr ghash_clmulni_intel 16384 0 aesni_intel 372736 0 nls_ascii 16384 1 libaes 16384 1 aesni_intel nls_cp437 20480 1 crypto_simd 16384 1 aesni_intel vfat 20480 1 cryptd 24576 2 crypto_simd,ghash_clmulni_intel cirrus 16384 0 fat 86016 1 vfat glue_helper 16384 1 aesni_intel iTCO_wdt 16384 1 intel_pmc_bxt 16384 1 iTCO_wdt rapl 20480 0 drm_kms_helper 278528 3 cirrus iTCO_vendor_support 16384 1 iTCO_wdt evdev 28672 2 serio_raw 20480 0 joydev 28672 0 watchdog 32768 1 iTCO_wdt virtio_balloon 24576 0 cec 61440 1 drm_kms_helper qemu_fw_cfg 20480 0 button 24576 0 br_netfilter 32768 0 bridge 262144 1 br_netfilter stp 16384 1 bridge llc 16384 2 bridge,stp overlay 147456 16 drm 634880 3 drm_kms_helper,cirrus fuse 167936 1 configfs 57344 1 ip_tables 36864 0 x_tables 53248 11 xt_conntrack,xt_statistic,nft_compat,xt_tcpudp,xt_addrtype,xt_nat,xt_comment,ipt_REJECT,ip_tables,xt_MASQUERADE,xt_mark autofs4 53248 2 ext4 942080 1 crc16 16384 1 ext4 mbcache 16384 1 ext4 jbd2 151552 1 ext4 crc32c_generic 16384 0 hid_generic 16384 0 usbhid 65536 0 hid 151552 2 usbhid,hid_generic xhci_pci 24576 0 virtio_blk 20480 3 xhci_hcd 307200 1 xhci_pci ahci 45056 0 virtio_net 61440 0 libahci 49152 1 ahci net_failover 24576 1 virtio_net crct10dif_pclmul 16384 0 crct10dif_common 16384 1 crct10dif_pclmul libata 299008 2 libahci,ahci failover 16384 1 net_failover crc32_pclmul 16384 0 crc32c_intel 24576 3 scsi_mod 270336 1 libata psmouse 184320 0 usbcore 331776 3 xhci_hcd,usbhid,xhci_pci i2c_i801 32768 0 i2c_smbus 20480 1 i2c_i801 lpc_ich 28672 0 virtio_pci 28672 0 usb_common 16384 2 xhci_hcd,usbcore virtio_ring 36864 4 virtio_balloon,virtio_pci,virtio_blk,virtio_net virtio 16384 4 virtio_balloon,virtio_pci,virtio_blk,virtio_net

thomasferrandiz commented 6 months ago

It looks like the same issue as https://github.com/flannel-io/flannel/issues/1906. It should be fixed by https://github.com/flannel-io/flannel/pull/1914

manuelbuil commented 6 months ago

We should as well stop using infinite loops and for example migrate to https://pkg.go.dev/k8s.io/apimachinery/pkg/util/wait with a defined timeout where the process crashes after a while if something is not working as expected. Otherwise, it is hard to detect that things are not working well

rbrtbnfgl commented 6 months ago

Closing it fixed on the latest release.