We have found an issue with a customer where the number of connections/FD open on the vector process was so high that it was not able to allocate any more sockets for any other connections on that server.
This was because the connections were being closed due to some external factor - usually a firewall killing long standing connections, or other network issues - and since vector did not receive a FIN/RST or any other answer on that socket, it kept the connection alive indefinitely.
To Reproduce
I have only tested/verified this running cos-proxy on LXD as it was similar to the customer environment.
Do a regular deployment of some applications related to COS/COS-proxy, where COS-Proxy is running on an LXD container.
On the container, verify which other units/applications are connected to vector by using ss -ntpm | grep vector
DROP all the connections towards port 5066 for a couple of minutes: iptables -A INPUT -p tcp --dport 5066 -j DROP
Wait 5 minutes so that the connections are closed from the application side
After you ensure that the connections have been dropped from the connection side, remove the iptables rule and let them connect again: iptables -D INPUT 1
You should see the old connections + new connections with ss -ntpm | grep vector and the old ones are not getting dropped.
I only tested this on a COS-proxy running on an LXD container, the rest of the deployment doesn't really matter, as long as there are some applications related to COS and logstash.
Bug Description
Hi,
We have found an issue with a customer where the number of connections/FD open on the
vector
process was so high that it was not able to allocate any more sockets for any other connections on that server. This was because the connections were being closed due to some external factor - usually a firewall killing long standing connections, or other network issues - and since vector did not receive a FIN/RST or any other answer on that socket, it kept the connection alive indefinitely.To Reproduce
I have only tested/verified this running cos-proxy on LXD as it was similar to the customer environment.
ss -ntpm | grep vector
DROP
all the connections towards port 5066 for a couple of minutes:iptables -A INPUT -p tcp --dport 5066 -j DROP
iptables -D INPUT 1
ss -ntpm | grep vector
and the old ones are not getting dropped.I have fixed this by adding the (keepalive)[https://vector.dev/docs/reference/configuration/sources/socket/#keepalive] option under the logstash definition and adding a timeout there.
Short fix incoming.
Environment
I only tested this on a COS-proxy running on an LXD container, the rest of the deployment doesn't really matter, as long as there are some applications related to COS and logstash.
Relevant log output
Additional context
No response