elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
995 stars 24.82k forks source link

Watcher Threadpool should not Reject Tasks? #83369

Open original-brownbear opened 2 years ago

original-brownbear commented 2 years ago

I believe it's a bug that the watcher threadpool rejects operations at present.

Currently, we run a scheduler that works out what watches to execute and then submit tasks for watches that need to be executed to the watcher pool. Once the watcher pool's queue reaches a certain length it rejects work. This does not scale to large numbers of watches. Once too many watches trigger at any point in time, those at the back of the queue seem to simply never get executed.

Shouldn't we instead just enqueue all watches that are triggered without bound on the watcher pool level and bound the number of things in flight by never scheduling execution for the same watch more than once (i.e. if there's already an execution queued for a watch just don't submit another one)? This way we wouldn't needlessly break down at high watch counts just because of unfortunate timing across a large number of watches?

elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)

jakelandis commented 2 years ago

Watches are generally time sensitive w.r.t. to their relative execution time. For example, you may schedule a Watch every 60s and look back at the past 5 minutes for any conditions that may trigger a watch. If the Watch sits in that queue for longer than 5 minutes (or what ever the timeliness expectations are) it can miss the condition it was searching for. Ideally the Watch would be written to avoid that by using {{ctx.trigger.scheduled_time}} but many Watches don't do that. Additionally if a Watch sits in the queue for a long period of time, even if correctly using the scheduled time, it doesn't run for a long period of time it could be very late to the alert up on the thing it is watching (like a node going down) minimizing the effectiveness.

Shouldn't we instead just enqueue all watches that are triggered without bound on the watcher pool level and bound the number of things in flight by never scheduling execution for the same watch more than once (i.e. if there's already an execution queued for a watch just don't submit another one)?

It kinda does that already, but admittedly not terribly well for this use case. It will queue up all the watches in the .triggered_watches index (mostly) unbounded (for a node crashing use case). Then after writing to the .triggered_watches it will submit each watch to be executed in the watch thread pool. If rejections happen the Watch will remain in .triggered_watches, but takes a full restart of Watcher to pick up those rejected watches back up for execution. Ideally for the rejection use case we would not need to wait for a full stop/start or new instance to start back up to reprocess the rejected watches. I like the durable queue approach of .triggered_watches and think we should evolve that approach to better handle rejections. I would worry about an unbounded memory queue since there are very few limits on the frequency, count, and duration of the watches. It seems it could be easy to consume the entire heap if Watcher gets in trouble...which happens a lot due to factors outside of Watcher's control (overloaded searches, slow email servers, etc.).