elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
75 stars 3.5k forks source link

Logstash hangs up after elasticsearch was down (with kafka input) #9421

Closed yfoelling closed 6 years ago

yfoelling commented 6 years ago

More Info: With Kafka input and Elasticsearch output, logstash stops if elasticsearch is unreachable for a while. The process is still running, but doesnt process new data.

The logstash is running in Kubernetes, so i have the problem that i need to manualy restart it each time ES was not reachable. IMHO it should end/stop the sevice or wait for the cluster to get up again.

Log output while ES is down (for some hours in my case):

2018-04-23T16:58:39.510378721Z [WARN ] 2018-04-23 16:58:39.510 [Ruby-0-Thread-11@[main]>worker1: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:385] elasticsearch - UNEXPECTED POOL ERROR {:e=>#}
2018-04-23T16:58:39.510487776Z [ERROR] 2018-04-23 16:58:39.510 [Ruby-0-Thread-11@[main]>worker1: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:385] elasticsearch - Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>64}
2018-04-23T16:58:42.951478403Z [WARN ] 2018-04-23 16:58:42.951 [Ruby-0-Thread-8: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.1-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:232] elasticsearch - Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"SERVER", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [SERVER][Manticore::SocketException] Connection refused (Connection refused)"}

Log output at the time es is up again:


2018-04-23T16:59:42.326249582Z Exception in thread "Ruby-0-Thread-23: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" Exception in thread "Ruby-0-Thread-22: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.326318585Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.326342549Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.326350452Z  at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.326358068Z  at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.326363546Z  at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.326455383Z  at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.32646587Z   at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.326472091Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.326477217Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.326483081Z  at java.lang.Thread.run(java/lang/Thread.java:748)
2018-04-23T16:59:42.326607591Z org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.326619456Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.326680519Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.326690954Z  at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.326696757Z  at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.326703114Z  at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.326749471Z  at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.326760403Z  at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.326801054Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.326889432Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.326901776Z  at java.lang.Thread.run(java/lang/Thread.java:748)
2018-04-23T16:59:42.33038824Z Exception in thread "Ruby-0-Thread-20: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" Exception in thread "Ruby-0-Thread-21: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.330428219Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.330513757Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.33053427Z   at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.330542261Z  at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.330548441Z  at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.330554574Z  at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.33056064Z   at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.330566733Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.330572125Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.330677833Z  at java.lang.Thread.run(java/lang/Thread.java:748)
2018-04-23T16:59:42.330688725Z org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.330696963Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.330702453Z  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.330713048Z  at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.330718522Z  at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.330724478Z  at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.331021615Z  at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.331034813Z  at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.331041498Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.331046719Z  at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.331051815Z  at java.lang.Thread.run(java/lang/Thread.java:748)```
andrewvc commented 6 years ago

These github issues are only for bugs with Logstash core. This looks to be an issue with the Kafka input. I'm moving this issue there and closing it here. You can track this here: https://github.com/logstash-plugins/logstash-input-kafka/issues/265