elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.2k stars 3.5k forks source link

Fault tolerance not working when multiple hosts #10724

Open TIgorrr opened 5 years ago

TIgorrr commented 5 years ago

Another host has been added to the logstash output to provide fault tolerance and load balancing.

Output:

elasticsearch {
    hosts => ["192.168.1.13:9200","192.168.1.19:9200"]
    sniffing => "true"
    index => "stat-%{+YYYY.MM}"
    user => "elastic"
    password => "*************"
}

The sniffing option was added when errors occurred, but this did not improve the work.

I want to connect to multiple hosts for failover and load balancing purposes. load balancing works, but if one host fails, no data is sent to the second host either.

Log on failure of one host:

[2019-04-25T16:55:43,435][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@192.168.1.19:9200/][Manticore::ClientProtocolException] 192.168.1.19:9200 failed to respond {:url=>http://elastic:xxxxxx@192.168.1.19:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@192.168.1.19:9200/][Manticore::ClientProtocolException] 192.168.1.19:9200 failed to respond", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2019-04-25T16:55:43,436][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@192.168.1.19:9200/][Manticore::ClientProtocolException] 192.168.1.19:9200 failed to respond", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}

[2019-04-25T16:55:43,687][WARN ][logstash.outputs.elasticsearch] Error while performing sniffing {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@192.168.1.19:9200/][Manticore::SocketException] Connection refused (Connection refused)", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:293:in `perform_request_to_url'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:278:in `block in perform_request'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:373:in `with_connection'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:277:in `perform_request'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:164:in `check_sniff'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:157:in `sniff!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:146:in `block in start_sniffer'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:128:in `until_stopped'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:144:in `block in start_sniffer'"]}

[2019-04-25T16:55:43,698][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[http://elastic:xxxxxx@192.168.1.19:9200/], :added=>[]}}

[2019-04-25T16:56:44,017][ERROR][logstash.outputs.elasticsearch] Encountered a retryable error. Will Retry with exponential backoff  {:code=>503, :url=>"http://192.168.1.13:9200/_bulk"}

[2019-04-25T16:57:49,869][WARN ][logstash.outputs.elasticsearch] Elasticsearch output attempted to sniff for new connections but cannot. No living connections are detected. Pool contains the following current URLs {:url_info=>{http://elastic:xxxxxx@192.168.1.13:9200/=>{:in_use=>0, :state=>:dead, :version=>"6.7.1", :last_error=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError: Could not reach host Manticore::SocketTimeout: Read timed out>, :last_errored_at=>2019-04-25 16:57:46 +0300}}}

[2019-04-25T16:57:50,059][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>8}
jsvd commented 5 years ago

from the messages it seems that .13 is responding 503 and it can't connect to .19, are there only two hosts? did you enable sniffing expecting more nodes to show up?

TIgorrr commented 5 years ago

from the messages it seems that .13 is responding 503 and it can't connect to .19, are there only two hosts? did you enable sniffing expecting more nodes to show up?

503 error when connecting to 13, only when two hosts and one down. If only 13 or only 19 or both, then everything works well.

jsvd commented 5 years ago

Sorry I don't understand. 19 shows connection refused and 13 shows 503, so there's no working host. Every 5 seconds the plugin will go through dead connections and try to reconnect, which should also show in the logs.