elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.18k stars 3.49k forks source link

logstash persistent queue inhibits the ability to see near real time data #10075

Open swtrux opened 5 years ago

swtrux commented 5 years ago

When using persistent queueing in logstash access to near real time data is not available until queued data is worked off. The prevents access to live data that is being sent to logstash and ultimately prevents knowledge of metrics, logs and security information until the queued data is sent.

Example Scenario: There are numerous servers at a site which are sending metrics, windows events and application logs to an instance of logstash at the site. All is working fine on Friday when everyone goes home. The connection from the logstash instance to the Elastic cluster goes down late Friday night so the Logstash instance writes data to it's persistent queue all weekend. On Monday the workers return and find this connection issue and resolve it. Logstash detects the correction and starts working off the queue, but there is hours of data to work off. The workers bring up the dashboards they normally use to check that status of the site, but cannot see any data because it's so far behind. They have no way of knowing the current status of the servers at the site even though the servers are sending current data to logstash. They have to wait until all data in the persistent queue is worked off before they can see the near real time status of the servers.

I would like an option added to logstash to allow prioritizing live data over queued data. The queued data should still be sent but possibly over a second pipeline or mixed in with the current data. The affect should be that as soon as connectivity from logstash to elastic is restored logstash should start sending newly ingest data directly to elastic instead of continuing to add it to the queue. At the same time it should also start working the data off the queue. The affect will be users will immediately have access to current data and will eventually have access to all persisted data.

I love the idea of persistent queues but the current implementation that prevents access to current data upon resolving connectivity issues prevents me from using it.

untergeek commented 5 years ago

First, a workaround if you encounter something like this in the future. There is a procedure to allow this, but it is quite manual.

  1. Shutdown Logstash.
  2. Move the persistent queue file(s) out of their current path.
  3. Restart Logstash
  4. Spin up a Logstash pipeline to read from the saved PQ files.

Second, a way to address this automatically is somewhat problematic, because what is entailed in the workaround would still have to take place behind the scenes.

It might be possible to make an API call that will "rotate" the PQ file to a clean, new file which is where the live data will go, and spin up a pipeline that mirrors the filter and output block of the original behind the scenes to read from the "rotated" file and send it along its way. Setting this pipeline to only have one pipeline worker thread will prioritize the live feed.

I'm no longer a developer on the Logstash team (I moved to the Professional Services team), but thought the use case was interesting. I wanted to add the workaround first, before hypothesizing a potential feature.

geekpete commented 1 year ago

The ability of option to process events from the back of the queue where most recent events are being added rather than the front where oldest events are would be another approach that's probably simpler to implement in code change? Or processing from both ends with higher priority to process the back of the queue first.