elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
113 stars 3.51k forks source link

Proposal: Eager start of inputs #11493

Open colinsurprenant opened 4 years ago

colinsurprenant commented 4 years ago

Relates to #11175 #11170

Context

Logstash is launching workers pipelines initialization and execution in threads and then immediately starting the input threads. This strategy has produced a different behaviour between the Ruby and Java execution:

In #11492 we will be making sure that the pipeline initialization is completed before starting the inputs. This is an easier to understand behaviour and will become the default.

Proposal

In some use-cases it might be desirable to have the possibility to eagerly start inputs, especially in conjunction with Persistent Queue enabled to minimize data loss by having inputs start ASAP and write data to PQ while the pipeline initialization is in progress.

peacand commented 4 years ago

To my mind, it would be nice to have this option because the "best" behavior depends actually on the sources and the target we want to achieve. For realtime sources not able to manage back pressure such as Syslog UDP, it could be nice to start the listeners asap with PQ to prevent any dataloss.

But in my case for example, the input volume is so high so that if I start the inputs early and the compilation of the filters takes ~6min, with PQ the pipeline will never be able to catch up and the outputs will be ~5min late forever. Which I absolutely don't want. But it is specific to my usage.

So I think having an option to start the input listeners as early as possible or only after the filters are ready makes sense.

colinsurprenant commented 4 years ago

Thanks for your feedback @peacand. Two observations: