Open andrewvc opened 8 years ago
by "asynchronously" you mean to have each input running on its own thread?
@talevy they currently do all run in separate threads, this means that 'register' will run in a separate thread as well. Current each input's register method is run in the main thread. If one plugin's register method is stuck, subsequent ones won't start and workers won't start either.
as in:
change this existing snippet[1] into something like:
...
@inputs.each do |plugin|
input_thread = Thread.new do
plugin.register
inputworker(plugin)
end
@input_threads << input_thread
end
@talevy exactly. I do think we'd want that state machine stuff added in for debugability however.
one thing with FSMs that I find incredibly useful is storing the state history so for example (I know that the states mentioned in the original post are speculative) to know that one reached transient error
from running
or starting
would be important knowledge.
@andrewvc, @talevy, I've been troubleshooting this issue and came across this and the source discussion on this problem. Is @talevy's solution usable with minor tweaking, or was this deprioritized due to high level of effort?
I'm looking to have up to 30 rabbit nodes spread out across different locations, and being unable to restart logstash without losing ingest if a single node is down is a bit scary!
edit: I'm not familiar with how logstash plugins are designed, but I know for example that redis input does not block in this type of situation. Is this potentially something that could/should be changed within the logstash-input-rabbitmq plugin?
This line of inquiry was inspired by the question here: https://github.com/logstash-plugins/logstash-input-rabbitmq/issues/58
This is how it works today, which is great, but has a problem I will describe below.
The above is great, however, if in step 2. Logstash is restarted, ALL inputs are blocked until the one RabbitMQ input connected to the down server is connected since its
register
method blocks waiting for an RMQ server.I propose the following:
:starting, :running, :stopped, :permanent_error, :transient_error
. These states could be accessible via API, and state transitions would be logged.These state machines could be really useful from an operator standpoint if exposed from an API or used in a centralized config mgmt system.