elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.14k stars 4.91k forks source link

Filebeat gets stuck on one unhealthy Logstash #40544

Open henrikno opened 4 weeks ago

henrikno commented 4 weeks ago

We have a pool of Logstashes behind a load balancer handling requests from a fleet of beats. We've noticed that if one single Logstash is unhealthy but still listening on the beats port, beats will connect to it, try to write data, and wait forever. Since we use ttl to rebalance connections to different logstashes, eventually they will all connect to this unhealthy instance and get stuck. Removing it from the load balancer doens't seem to help since the connection is already established. Restarting either logstash or beats after removing it from the load balancer does.

Please include configurations and logs if available.

output:
  logstash:
    hosts: loadbalancer
    ttl: 300s
    loadbalance: true
    worker: 5
    pipelining: 0

For confirmed bugs, please report:

Set up filebeat pointing to two logstashes. See it connect to one of them. Add iptables rules to block traffic on the logstash port. Filebeat seems to wait forever to get a response. (In our case the logstashes are behind a load balancer, but not 100% that's required to reproduce) I expected it to time out, and reconnect to the healthy Logstash.

There seems like there's support to set a read timeout on go-lumber, not sure if it's currently set or if that's not working. https://github.com/elastic/go-lumber/blob/main/client/v2/client.go#L247

wandergeek commented 2 weeks ago

This is likely an instance of https://github.com/elastic/go-lumber/issues/35