Open colinsurprenant opened 4 years ago
To my mind, it would be nice to have this option because the "best" behavior depends actually on the sources and the target we want to achieve. For realtime sources not able to manage back pressure such as Syslog UDP, it could be nice to start the listeners asap with PQ to prevent any dataloss.
But in my case for example, the input volume is so high so that if I start the inputs early and the compilation of the filters takes ~6min, with PQ the pipeline will never be able to catch up and the outputs will be ~5min late forever. Which I absolutely don't want. But it is specific to my usage.
So I think having an option to start the input listeners as early as possible or only after the filters are ready makes sense.
Thanks for your feedback @peacand. Two observations:
Relates to #11175 #11170
Context
Logstash is launching workers pipelines initialization and execution in threads and then immediately starting the input threads. This strategy has produced a different behaviour between the Ruby and Java execution:
With the Ruby execution the pipelines initialization and execution was almost almost immediate so there was no noticeable delay between the input starting and the worker processing data.
With the Java execution the pipeline initialization is slower because of the involved compilation. The pipeline initialization time has been improved in #11482 but nonetheless it will always take longer than the legacy Ruby execution.
In #11492 we will be making sure that the pipeline initialization is completed before starting the inputs. This is an easier to understand behaviour and will become the default.
Proposal
In some use-cases it might be desirable to have the possibility to eagerly start inputs, especially in conjunction with Persistent Queue enabled to minimize data loss by having inputs start ASAP and write data to PQ while the pipeline initialization is in progress.