elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
75 stars 3.5k forks source link

Logstash fails to shutdown after FATAL when `pipeline.unsafe_shutdown: true` #10573

Open yaauie opened 5 years ago

yaauie commented 5 years ago

When Logstash is run with either the --force-shutdown flag or pipeline.unsafe_shutdown: true in logstash.yml, there are some situations where the process can live on even after the shutdown watcher emits its FATAL message indicating that it is "Forcefully quitting Logstash".

[2019-03-18T14:50:43,794][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>27, "name"=>"[test]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.3.0-java/lib/logstash/outputs/elasticsearch/common.rb:36:in `sleep'"}]}}
[2019-03-18T14:50:43,798][FATAL][org.logstash.execution.ShutdownWatcherExt] Forcefully quitting Logstash ...
Exception in thread "Ruby-0-Thread-17: :1" org.jruby.exceptions.RaiseException: (SystemExit) exit
    at org.logstash.execution.ShutdownWatcherExt.force_exit(org/logstash/execution/ShutdownWatcherExt.java:197)
    at org.logstash.execution.ShutdownWatcherExt.start(org/logstash/execution/ShutdownWatcherExt.java:166)

I'm still in the process of putting together a minimal reproduction, and will update this ticket when I can do so.

The initial report involved pipelines managed by Kibana's Logstash pipeline management with a logstash-output-elasticsearch that was failing to connect to its configured host (related: https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/852).

natefox commented 5 years ago

I was going to make this same bug report yesterday! The issue I'm seeing is if you run with --config.reload.automatic, a new, fixed config will never get loaded.

Easy reproduction of this. My config:

input { http { port => 8443 } }
output {
  stdout { codec => rubydebug { metadata => true } }
  elasticsearch { hosts => [ "some.invalid.hostname.here:9200" ] }
}

And then I just do

curl -i -XPUT -H 'content-type: application/json' localhost:8443 -d '{"data": "mydatahere"}'

It'll show the object in the logs but when you change the config to a valid hostname (or remove es output entirely) it will throw the error that 0 are in flight (like you have in your post).

jsvd commented 5 years ago

This seems to be because we're only raising a SystemExit exception instead of actually ensuring the process terminates. We had the same issue before with ruby's exit(int) vs exit!(int) in https://github.com/elastic/logstash/pull/8729

The fix here is to not even try to raise an exception but just terminate the process immediately with a java's System.exit(int)

danielkasen commented 5 years ago

Is there any work around for this currently? running 6.5.4 and we have to kill -9 every node if someone accidentally sets the wrong tcp output.