elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
72 stars 3.5k forks source link

Persistent Queue documentation could be clearer about its limitations. #8906

Open yaauie opened 6 years ago

yaauie commented 6 years ago

The current documentation on the Limitations of Persistent Queues, while technically accurate, doesn't provide readers with sufficient context to fully understand the limitations of PQs, especially as they apply to the myriad input plugins we support.

The following are problems not solved by the persistent queue feature:

  1. Input plugins that do not use a request-response protocol cannot be protected from data loss. For example: tcp, udp, zeromq push+pull, and many other inputs do not have a mechanism to acknowledge receipt to the sender. Plugins such as beats and http, which do have an acknowledgement capability, are well protected by this queue.
  2. It does not handle permanent machine failures such as disk corruption, disk failure, and machine loss. The data persisted to disk is not replicated.

-- Limitations of Persistent Queues

While a comprehensive list of plugins that use a "request-response protocol" would be helpful, such a list would likely fall out-of-date quickly as our ecosystem continues to evolve.

I believe it will be possible to improve the documentation in such a way that we could provide more clarity without maintaining such a list.

jordansissel commented 6 years ago

I agree with you that "request response" is possibly a bad term given it excludes inputs like the file, s3, jdbc, etc, which also are generally protected by the PQ because they only commit a record of completed work after that work is delievered to the queue.

I spent a few minutes trying to think of a better phrase to describe these kinds of plugins (kafka, http, file, jdbc, etc) and haven't had any solid ideas on a good phrase that would clearly identify plugins that would be can protected by the PQ

Thoughts?

mahadevans87 commented 4 years ago

Currently I use s3 input plugin in my logstash configuration. Does it work with persistent queues? I am curious to understand if it does handle back pressuring without overwhelming my elastic search cluster?

LuckyWindsck commented 4 years ago

I am new to logstash and not sure how logstash implement PQ. Therefore, a comprehensive list will be very helpful for beginners like me. I know that it seems difficult to keep tracking the list, so is there any general way that I could check whether a plugin support PQ? (e.g. Which part of plugin source code can I check?)

anuraagvaidya commented 3 years ago

So I was searching for the right, high-performance input plugin for an enterprise-grade application.

I know that TCP would deliver much better performance over HTTP but the persistence queue would not work over it.

I definitely don't want to use Filebeat because I'd have to write in-memory information on file for it to be able to read it, and that would increase disk usage.

The next candidate is WebSocket. This would be much better than HTTP as well. But does WebSocket input plugin support persistent queue?