Logstash Exceptionhandling/Failover for output plugins

elastic / logstash

Logstash - transport and process your logs, events, or other data

https://www.elastic.co/products/logstash

Other

78 stars 3.5k forks source link

Logstash Exceptionhandling/Failover for output plugins #3891

Open pueffl opened 9 years ago

pueffl commented 9 years ago

We are currently evaluating ELK and so far it looks very promising, but we found one big issue we couldn't find satisfying information so far:

We want to log all our events in Elasticsearch, so when something fails there, we want to implement some kind of failover mechanism and save the according data somewhere else, so the basic question is:

How can we catch Exceptions from an output plugin and handle them? I already discussed this and it seems there is no general solution for such cases so far. What we would need is something like:

output { elasticsearch { ... } failover { file { ...} }

or:

output { failover { elasticsearch { ... } file { ...} } }

As I mentioned above, for us this is a really important issue and my prevent us from using the ELK stack if we don't find a solution for that problem...

txxg commented 7 years ago

Any progress regarding this issue?

jordansissel commented 7 years ago

@txxg this issue isn't well described, so I'm not sure how to respond without asking more questions -

We want to log all our events in Elasticsearch, so when something fails there

Can you describe what you mean by 'fails'?

we want to implement some kind of failover mechanism

If Elasticsearch is down, what failover response would you choose, and why?

For permanent delivery failures (where some output can never be successful under any circumstances) we are developing a new feature called a dead letter queue (DLQ) that will be used when an event is undeliverable. This DLQ is targeted for Logstash 5.5.

txxg commented 7 years ago

I think what @pueffl and me want is similar to: http://flume.apache.org/FlumeUserGuide.html#failover-sink-processor.

In short, when Elasticsearch is down, we need to keep the incoming logs to someplace like local disk before Elasticsearch recover from the failures.

jordansissel commented 7 years ago

we need to keep the incoming logs to someplace like local disk before Elasticsearch recover from the failures

I think I am misunderstanding something.

If an output is down, like Elasticsearch, Logstash will wait for Elasticsearch to recover before continuing. There's no need for a kind of 'failover sink' that Flume has because of the way Logstash is designed -- we choose backpressure as a behavior when an output plugin is failing.

Let's say hypothetically such a 'local disk store' existed where if Elasticsearch was offline Logstash could write to it. Can you tell me what benefits you expect from such a feature?

txxg commented 7 years ago

@jordansissel Thanks for your replay, In my case, Our service designed to receive logs that sent from customr's mobile app or server's syslog, if elasticsearch is down then we will lose those logs.

jordansissel commented 7 years ago

Our service designed to receive logs that sent from customr's mobile app or server's syslog

This helps me understand a big more, thank you!

I would recommend using the Logstash persistent queue feature for your case. Enabling the persistent queue provides two functions, and for your case, the important function is to be able to absorb data as fast as possible, even if something downstream (Elasticsearch) is down or unreachable.

txxg commented 7 years ago

@jordansissel I think persistent queue is what I want, I'm using Logstash 5.0 now, and this feature was introduced in 5.1.x, So I didn't notice it until you tell me! Thank you!

Ricaz commented 4 years ago

As far as I've read, persistent queues would not help, since inputs are disabled when an output fails. Please correct me if I'm wrong.