cloudnativelabs / kube-router

Kube-router, a turnkey solution for Kubernetes networking.
https://kube-router.io
Apache License 2.0
2.3k stars 466 forks source link

Getting RST on long-lived connections #1577

Closed vladimirtiukhtin closed 9 months ago

vladimirtiukhtin commented 10 months ago

What happened? I don't know exactly but from time to time my apps receive RST from the kernel they are running on. It looks very much like this https://unix.stackexchange.com/questions/572276/l4-balancing-using-ipvs-drop-rst-packets-failover but there is no IP switching or anything

App does syn/ack, then works.... time passes here..... app sends [P.] and gets RST from kernel immediately. I have a feeling that some syncing/re-syncing mechanism in kube-router flashes state table (does ipvs have state table??)

What did you expect to happen? Nothing like that, just a working socket

Screenshots / Architecture Diagrams / Network Topologies Screenshot from 2023-11-28 17-39-33

System Information (please complete the following information):

vladimirtiukhtin commented 10 months ago

sysctl net.ipv4.vs.sloppy_tcp=1 seems to effectively mask the problem, but it is not available for IPv6. Was this https://github.com/cloudnativelabs/kube-router/issues/911 ever implemented? I would set timeout to 24 hours and call it a solution

aauren commented 9 months ago

Can you please add your reproduction scenario to this issue? There isn't really enough information in this in order for me to do anything with this.

It may be that #911 would resolve the issue that you're seeing, but without more details and concrete steps to reproduce this, I can't be sure.

In general, since we had two issues related to timeout flexibility, we should probably expose the option. Given that, I'll re-open #911 and submit a PR to allow it to be configured. But I can't be sure that, that will resolve what you're experiencing without more information.

aauren commented 9 months ago

If you want to try adjusting IPVS timeouts you can try doing so via: https://hub.docker.com/layers/cloudnativelabs/kube-router-git/PR-1590/images/sha256-4912ca11f4dc6fa41a73c9549e8191f744cb30e170332bd90f0e3e2d563eebbf?context=explore

See #1590 for more information on the exposed options.