We're encountering a problem when running our event processors via the ESPRunner. When one of the event processors running in a child process fails and terminates prematurely, the ESPRunner just ignores the problem. Eventually, our event processor lag monitor will raise the alert to the on-call developer, who in turn can manually restart the ESPRunner.
The process status list looks something like this:
We explored adding an extra option where the ESPRunner would shutdown in such a scenario in #215.
Change
This PR proposes a more extensible solution: provide a hook for the event processor failure. This'll allow teams to choose and implement an appropriate response as they see fit. Here're a few examples:
Report to Rollbar
EventSourcery::EventProcessing::ESPRunner.new(
event_processors: processors,
event_source: source,
after_subprocess_termination: proc do |processor:, runner:, exit_status:|
if exit_status != 0
Rollbar.error("Processor #{processor.processor_name} "\
"terminated with exit status #{exit_status}")
end
end
).start!
Shutdown the ESPRunner
EventSourcery::EventProcessing::ESPRunner.new(
event_processors: processors,
event_source: source,
after_subprocess_termination: proc do |processor:, runner:, exit_status:|
runner.shutdown
end
).start!
Restart the event processor
EventSourcery::EventProcessing::ESPRunner.new(
event_processors: processors,
event_source: source,
after_subprocess_termination: proc do |processor:, runner:, exit_status:|
runner.start_processor(processor) unless runner.shutdown_requested?
end
).start!
We're encountering a problem when running our event processors via the
ESPRunner
. When one of the event processors running in a child process fails and terminates prematurely, theESPRunner
just ignores the problem. Eventually, our event processor lag monitor will raise the alert to the on-call developer, who in turn can manually restart the ESPRunner.The process status list looks something like this:
We explored adding an extra option where the
ESPRunner
would shutdown in such a scenario in #215.Change
This PR proposes a more extensible solution: provide a hook for the event processor failure. This'll allow teams to choose and implement an appropriate response as they see fit. Here're a few examples:
Report to Rollbar
Shutdown the ESPRunner
Restart the event processor