flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.6k stars 2.87k forks source link

question about iptable update rules #1965

Open busishe opened 1 month ago

busishe commented 1 month ago

Expected Behavior

I use Flannel to manage the allocation strategy of cluster IPs in the k8s cluster. During use, it was found that when a pod dies, its cluster IP no longer exists in the cluster, but it can still ping and telnet this IP. After checking the IP rules of the node machine, it was found that the k8s node still retained the posting chain of this IP and did not seem to delete it properly, which led to my microservices mistakenly registering the service(phenomenon : the old cluster ip still remained in nacos registry center).

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

On k8s master : kubectl get pods --all-namespaces -o wide|grep 172.25.163 image ip:172.25.163.7 not exist but it can still ping and telnet this IP image

On k8s node (flannel ip:172.25.163.0):iptables -t nat -L -n -v image

my etcd config is like: /coreos.com/network/config {"Network":"172.25.128.0/17","Backend":{"Type":"vxlan","Directrouting":false}}

I don't know if this phenomenon is normal, but this non-existent IP appears in the nacos registry.

Your Environment

rbrtbnfgl commented 1 month ago

Hi. How are you creating the pods? Are you exposing any services with from the pods? I saw that you are using an old flannel and Kubernetes version I don't recall if the MASQUERADE process was different from that old version but I am sure that from the recent versions Flannel is using one single rule based on the Pods CIDR from that node.

busishe commented 1 month ago

thx for reply.I create service from nacos pod , not the service pods in the picture.And im sure that svc ip range is differ from the cluster ip range.Im not a net engineer ,so i dont really understand how iptable and MASQUERADE process works.Our team used an older flannel version in the past 3 years (im not sure it is 0.7.x ?),this problem never happend. Addtionally,i find that the problem happen when a pod restart abnormally,e.g. OOM.When the problem happen,restart the pod manully,the registry info will be correct.

busishe commented 1 month ago

If pod go down, flannel expected to take back pod's cluster-ip ,and delete the rule in iptables?

rbrtbnfgl commented 1 month ago

Flannel shouldn't create any MASQUERADE rule for the pod.

busishe commented 1 month ago

Well,which process create the rules?

rbrtbnfgl commented 1 month ago

I know how Flannel creates the rules on the latest versions 0.11.0 is 5 years old I am not aware how the rules were managed and probably this bug has been fixed on a later version.

busishe commented 1 month ago

Thank you , we will try running a higher version flannel in our testing env,see if there are similar appearances.