Open pmuellr opened 3 weeks ago
Pinging @elastic/response-ops (Team:ResponseOps)
I wonder if we want to redo the search or if we want to continue going through the previous candidate tasks 🤔 if we do the search shortly after the previous one, there's a chance the index won't be refreshed yet from the regular 1s interval and it would return the same documents. Maybe we just need do mget + bulkUpdate a few times, or just continue with bulkUpdate.
In PR implement task claiming strategy mget #180485 we implemented an alternative task claiming strategy, but it has the following problem:
The original task search returns candidate tasks, which may be skipped if the mget determines the task doc was updated, or the bulk indicate indicates a conflict. In the worst case, this can result in the task claimer returning no tasks, even if there are tasks available to run.
Suggest we retry the entire claim phase, starting with the search, when we determine that:
hits.total
indicates more tasks available)I think there's likely a question if we want to change the test from "no tasks found" to "not many tasks found". For instance, if there are outstanding tasks to run, but we filtered out all but 1 because of conflicts, we probably want to try for N-1 (where N is the number of tasks requested). Kinda thing. Not sure what a good number would be though.