Open Arkatufus opened 1 week ago
Problem isolated to the Ceen.Httpd package. Socket usage was stable when Ceen was replaced with Kestrel.
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 758
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 753
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 741
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 754
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 749
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 742
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 752
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 750
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 742
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 752
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
Update here since I've been investigating this using our test lab - we can reproduce this issue on Linux too, and it's making me think that the problem might be related to how aggressively we try to re-connect during cluster formation.
Only about ~20 active TCP connections per node, which makes sense - most of these are Akka.Remote, an OTLP exporter, and maybe a few others
About ~1100 active TCP connections per node. This looks like hyper-aggressive retries, not some kind of TCP handling issue.
Another piece of evidence in favor of the "aggressive retries" theory of the case, look at the step function of active TCP connections when cluster formation does occur:
The oldest nodes have significantly more open TCP connections than the newer nodes that were started later during the deployment by Kubernetes. This looks more like a "Thundering Herd" problem rather than a resource leak.
We did some more work on this over the weekend and captured more data from more experiments - the problem is definitely caused by how frequently Akka.Management's cluster bootstrapper is HTTP-polling its peers:
The key setting at play here is the akka.management.cluster.bootstrap.contact-point.probe-interval
, which defaults to 1s. If we increase it to 5s
we see a much smaller number of concurrent TCP connections.
akka.management.cluster.bootstrap.contact-point.probe-interval = 1s
Running a 22 node cluster using Akka.Discovery.KubernetesApi, we see the following end to end cluster formation times with akka.management.cluster.bootstrap.contact-point.probe-interval = 1s
, the default. We also have a hard 20-nodes-must-be-up requirement configured for Cluster.Boostrap, so cluster formation can't occur until the 20th node has come online.
It takes about an average of 30s for a cluster to fully form - this is mostly due to the amount of time it takes Kubernetes to spin up all of the pods. The oldest nodes in the cluster have a longer average and the youngest ones have a shorter one, hence why you see this time distribution.
akka.management.cluster.bootstrap.contact-point.probe-interval = 5s
Same exact environment / reproduction sample as before, just with the probing interval set to 5s
:
The cluster never forms, and this is apparently due to a bug in the logic around "timing out" the freshness of a node's last healthy check-in - the configuration we use for this setting is totally independent of the polling interval and that is a bug.
@Arkatufus already identified this issue and is preparing a fix for it. If that fix works, then the port exhaustion problems can be addressed by just increasing the probing interval. We are going to test this in our lab and confirm before making concrete recommendations to affected users. Just wanted to post an update to let everyone know that this is being urgently addressed.
One other setting that can alleviate major stressors that contribute to this port exhaustion problem:
Set that to false
and this will also significantly reduce the amount of TCP traffic. I'll put up some data supporting that in the next day or so as well. Changing this setting can, in theory, open the possibly of a split brain forming but IMHO that should be quite rate in practice.
Should we just change the default polling interval to 5s - that should help put this issue to bed
It has been observed that Cluster.Bootstrap can cause network socket port exhaustion due to TCP protocol holding the socket port open in the WAIT_TIME linger state if Cluster.Bootstrap failed to form a cluster immediately.
This has been observed especially in conjunction with Akka.Discovery.Azure.