elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.47k stars 8.04k forks source link

[ResponseOps][Task Manager][mget Claimer] have claimer retry search if no eligeable tasks result from mget/update #184940

Open pmuellr opened 3 weeks ago

pmuellr commented 3 weeks ago

In PR implement task claiming strategy mget #180485 we implemented an alternative task claiming strategy, but it has the following problem:

The original task search returns candidate tasks, which may be skipped if the mget determines the task doc was updated, or the bulk indicate indicates a conflict. In the worst case, this can result in the task claimer returning no tasks, even if there are tasks available to run.

Suggest we retry the entire claim phase, starting with the search, when we determine that:

I think there's likely a question if we want to change the test from "no tasks found" to "not many tasks found". For instance, if there are outstanding tasks to run, but we filtered out all but 1 because of conflicts, we probably want to try for N-1 (where N is the number of tasks requested). Kinda thing. Not sure what a good number would be though.

elasticmachine commented 3 weeks ago

Pinging @elastic/response-ops (Team:ResponseOps)

mikecote commented 3 weeks ago

I wonder if we want to redo the search or if we want to continue going through the previous candidate tasks 🤔 if we do the search shortly after the previous one, there's a chance the index won't be refreshed yet from the regular 1s interval and it would return the same documents. Maybe we just need do mget + bulkUpdate a few times, or just continue with bulkUpdate.