Closed mikebell90 closed 3 years ago
Depends if you are interested in all of the tasks' state or just a single. If just one, could always use the task state endpoint. If getting all of them for a request at once can use the task ids by status call for the most efficiency there
So let me be clearer as to the goal:
I want to check that a specific task belonging to a specific request has the following attributes
Exists, meaning the task is Running, or Cleaning. Lost we don't want, and Pending is probably ok to avoid two.
The deploy timestamp. (This is actually wrong for our purposes, but we've made do. It doesn't take into account a long time to schedule or a decomission)
We only care about this task
In plain english "Is this task in a running or cleaning state, belongs to a specific request Id and has it started in the last N minutes"
try (Stream<SingularityTaskId> singularityTaskIdStream = Stream.of(
singularityTaskIdsByStatus.getHealthy().stream(),
singularityTaskIdsByStatus.getCleaning().stream(),
singularityTaskIdsByStatus.getNotYetHealthy().stream()).flatMap(t -> t)) {
Taking that steam and filter it. Currently we use "/api/tasks/ids/request/%s" but we are finding this is showing increasingly bad performance as we've scaled the cluster up.
Ok, that first task state (/api/track/task/{taskid}
) call is likely what you want then. It will check the singular task's data, not the whole request, falling back to task history if it isn't in the active data (if you are using mysql for history). We created the /track api for use cases like that since the regular task api is split into active vs history which can make it hard to work with
Hmm. Well, maybe not. I'm consistently getting WORSE performance that way.
Let me explain the scenario. This is running a test, appriximately 4-5 requests on the minute interval (5 instances, each with a fixedRate of 1 minute, single threaded executor)
The query in the end takes an input of taskId and requestId. For the old call it took this: g /request/{requestId}, and then looked for the matching taskId.
In the new one, it queries /api/track/task/{taskid}) and then looks to see if it has a matching requestId.
What I'm finding in two diferent environments ( one 130 agents, one about 400), is
a) The taskId api doesn't have as many outliers, but the outliers are much worse (12-15 seconds!) b) the requestId has many more outliers, but is consistently 1-2.5 s on the outliers c) Neither is performing well compared to when it "used" to., but the facts that have changed (number of agents, etC HAVE increased) are hard to eliminate.
If you are using sql, have you checked the metrics for your database and/or zk cluster? We have ~700 agents, ~22k tasks, and ~13k requests and that endpoint for me is consistently sub second
In the end my goal is to simply check "Is a specific Task in the running state, and for how long"
I have traditionally queried by RequestId, and then filtered down to the specific Task (eg /request/{requestId}
Is there a gain, particularly with say 50-100+ instances of doing a direct query by Task (/task/{taskId} ? That endpoint returns a lot more data, so it seemed non obvious which would be more performant?