elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.47k stars 8.04k forks source link

[ResponseOps][Task Manager][mget Claimer] fix method of changing REMOVED task types to unrecognized status #184938

Open pmuellr opened 3 weeks ago

pmuellr commented 3 weeks ago

In PR implement task claiming strategy mget #180485 we implemented an alternative task claiming strategy, but it has the following problem:

Like described in fix problem with limited concurrency starvation of unlimited concurrency tasks #184937, it's possible for a large number of REMOVED tasks, that need to have their status set to "unrecognized", can starve actual tasks that need to be run. Because like the limited concurrency task types, currently the search includes all the types.

I believe we may want to come up with a different solution than #184937 though. I believe the actual task search is not dependent on the status of "unrecognized", setting that status is really a cleanliness task. As such, I think we should move it to a new background task, perhaps called "task manager clean up" or such. I've wanted to have a background task available to be able to potentially run some consistency checks or whatever, time to start!

elasticmachine commented 3 weeks ago

Pinging @elastic/response-ops (Team:ResponseOps)

mikecote commented 3 weeks ago

@pmuellr do you know if this problem also exists for the update by query flow? From what I gather, if the update by query does a doc update to set unrecognized, it counts as say one of the tasks claimed. Probably subject to starvation as well.

pmuellr commented 3 weeks ago

Honestly do not know - I was kinda thinking of bypassing all that stuff at first, so didn't see how they were doing it - but was actually wondering the same as you - do they count? Prolly :-)

That's why I split that one off as a separate issue. Seems like a good thing to run every so often on a low priority (!!) task, just remove that processing from the task claimer completely - it's not really needed there.

mikecote commented 3 weeks ago

Agreed, sounds good! I put this issue in the backlog for now, a few of the other issues are in 8.15/8.16 plans.