kube-router-hairpin in nat table cannot be updated in real time

cloudnativelabs / kube-router

Kube-router, a turnkey solution for Kubernetes networking.

https://kube-router.io

Apache License 2.0

2.33k stars 471 forks source link

kube-router-hairpin in nat table cannot be updated in real time #1196

Closed z461948964 closed 2 years ago

z461948964 commented 3 years ago

当我删除一个pod然后重新拉起后，nat 表中的 KUBE-ROUTER-HAIRPIN 链会删除原来pod和svc的条目，但不会生成新的 pod和svc 的规则；除非iptables -t nat -F 后，systemctl restart kube-router 重启服务后会生成新的规则，但是我们希望它能实时动态更新Chain KUBE-ROUTER-HAIRPIN

kubernetes:1.22.4 kube-router:1.3.2 --hairpin-mode=true hairpinMode:true

aauren commented 3 years ago

I'm sorry, but unless you can translate your issue to English we won't be able to help you and will close the issue.

z461948964 commented 3 years ago

When I delete a pod and pull it up again,KUBE-ROUTER-HAIRPIN CHAIN in NAT table will delete the original entries of pod and SVC,However, new rules for pod and SVC will not be generated；Unless after "iptables - t NAT - F && systemctl restart kube router" will generate new rules after restarting the service, but we hope it can dynamically update Chain KUBE-ROUTER-HAIRPIN

kubernetes:1.22.4 kube-router:1.3.2 --hairpin-mode=true hairpinMode:true

aauren commented 3 years ago

Can you post your log files for kube-router?

It would also be helpful to see your exact steps to replicate this issue along with the state of the KUBE-ROUTER-HAIRPIN chain at each of the steps.

z461948964 commented 3 years ago

When I delete the pod and restart it, curl clusterip will timeout after entering the new nginx pod; Observe node01. After about 20 to 60 seconds, the original entry will be deleted, but no new rule has been generated

z461948964 commented 3 years ago

When I execute "iptables - t NAT - F & & systemctl restart kube-router",I can see that a new rule is generated immediately, and then curl clusterip can pass. In fact, I have observed that the "Chain KUBE-ROUTER-HAIRPIN" rule will only be reduced and will not be added unless you clear the NAT rule first and then restart the kube-router service.

z461948964 commented 3 years ago

kube-router.log This is the log of "kube-router"

aauren commented 3 years ago

@z461948964 Thanks for taking the time to translate your issue and for reporting. After spending some time I was able to reproduce the issue you described and found several bugs.

I've created #1200 to help fix these issues, can you please test it out in your environment and see if it resolves the issue you were experiencing?

z461948964 commented 3 years ago

I have recompiled the "kube-router" file with the "fix_hairpinning" branch, and now a new rule will be generated soon after restarting pod.
Thank you very much for your help. I like "kube-router" very much. I wish you a happy life.

z461948964 commented 2 years ago

微信图片_1 微信图片_2

Sorry, I think there is something wrong with the generation of "kube-router-hairpin" chain.

As shown in the figure, pod is at node01. I think the two rules of kube-router-hairpin chain are necessary at the node where pod is located. For example, in the master01 node, because there are rules in the red box above, it will not match the first one in the red box below; The second item in the red box below will not be matched, because clusterip is DNAT at the node of pod.

Of course, the addition of these two rules to nodes other than the pod has no impact, but with the increase of services, these rules will become more and more, which is not very friendly.

aauren commented 2 years ago

@z461948964 - #1208 should reduce the number of hairpin rules significantly by forcing it to only consider pods with local endpoints. Let me know if that resolves the situation you're seeing.

z461948964 commented 2 years ago

Yes, I used the branch: "aauren: hairpin_on_node_local" to test, which solved my problem. Thank you!