10 minutes is a long time to hold onto a dead connection, especially for nodes with high connection churn (e.g. load balancers).
We're still looking into the root cause of why we're missing tcp_close events occasionally on nodes, but this will help alleviate a lot of the memory pressure added.
On one node that we've tracked with high memory usage, this will bring down the stored connections from ~45k connections to ~10k.
The downside is that if we want to track live connections that are discarded by an application (e.g. leaked connections), we will only have two minutes worth of data on this.
For now this is fine since we're not doing anything around leaked connection tracking. We can revisit once we've identified the bug.
10 minutes is a long time to hold onto a dead connection, especially for nodes with high connection churn (e.g. load balancers).
We're still looking into the root cause of why we're missing
tcp_close
events occasionally on nodes, but this will help alleviate a lot of the memory pressure added.On one node that we've tracked with high memory usage, this will bring down the stored connections from ~45k connections to ~10k.
The downside is that if we want to track live connections that are discarded by an application (e.g. leaked connections), we will only have two minutes worth of data on this.