[ML] Enable performance tuning for `_ml/anomaly_detectors/<job_id>`

sorenlouv commented 4 years ago

APM is using the endpoint GET _ml/anomaly_detectors/<job_id> to determine if any job of a specific kind exists. We do that by matching on the job id:

GET _ml/anomaly_detectors/*high_mean_response_time

Since we don't need the actual jobs, (we only need to know if any or none exists) it would be great to omit the results from being returned. Elasticsearch normally allows the user to tune the performance of a query with flags like terminateAfter: 1, size: 0, timeout and track_total_hits.

Having something like this would be very useful:

GET _ml/anomaly_detectors/*high_mean_response_time?size=0&terminateAfter=1

WDYT?

elasticmachine commented 4 years ago

Pinging @elastic/ml-core (:ml)

droberts195 commented 4 years ago

Other Elasticsearch endpoints that return metadata objects don't have such detailed performance options. For example get users doesn't. It's an implementation detail whether users are stored in an index or elsewhere, and similarly it's an implementation detail whether ML jobs are stored in an index or elsewhere. (In fact, up until 6.6 they were always stored in cluster state and between 6.6 and 7.last the ML code needs to check both cluster state and the config index for jobs in case someone has just upgraded.)

There is a precedent for "exists" on metadata in the form of HEAD requests, such as for index exists. So we could implement something like that for ML jobs and tune the internal implementation such that it returned success as soon as the first matching job was found.

However, for Kibana solutions that use ML there's an extra complexity in that they need to go via ML Kibana APIs so that the ML Kibana code can check the space visibility of the jobs. Since that check will involve joining the job IDs returned from the ML backend to Kibana saved objects, a simple exists yes/no response won't be any good.

For Kibana solutions needing to know whether certain jobs exist and are visible in the current space we'd have to add an option to the ES API so that it just returned IDs, and then add a new ML Kibana API that returned exists yes/no after joining those IDs to the corresponding Kibana saved objects.

But I think we need to be mindful of the opportunity cost of doing all this work. It's hard to remove APIs once they're added so they have to be maintained for a long time and while we're doing that we cannot add ML features that would be more widely useful. Is the performance of the current get jobs endpoint so bad that this is a critical requirement? If it is then we should investigate what the performance bottleneck is and see if there is a quick win before implementing completely new APIs.

sorenlouv commented 4 years ago

Other Elasticsearch endpoints that return metadata objects don't have such detailed performance options. For example get users doesn't.

As far as I can see there is no endpoint for searching users by particular properties like you can with the ML jobs api. For users there's an endpoint that returns all users but without filtering.

I think part of the problem is that while each of the user endpoints only does one thing the ML endpoint does four things:

lookup a single job by id
search for jobs by id
lookup a single job by group
search for jobs by group

Slightly tangential: I think it would be very useful to have separate endpoints for each of these. That would make it a lot easier for me as a consumer since right now I have to expect different responses depending on the scenario I'm hitting. Having a dedicated _search endpoint would align this with elasticsearch APIs:

GET _ml/anomaly_detectors/_search

There is a precedent for "exists" on metadata in the form of HEAD requests, such as for index exists. So we could implement something like that for ML jobs and tune the internal implementation such that it returned success as soon as the first matching job was found.

That would work for my use case

However, for Kibana solutions that use ML there's an extra complexity in that they need to go via ML Kibana APIs

True, I can see how that makes it more complex.

But I think we need to be mindful of the opportunity cost of doing all this work.

True and no, this is not critically needed at the moment. I do think it would be very useful to have an endpoint for searching ml jobs based on metadata (like group/tags and other properties) and be able to tune the request like any other elasticsearch query. But with our current scaling requirements in mind it is not something that's critical to us right now.

davidkyle commented 4 years ago

We've often discussed adding from and size parameters to the GET Jobs API which would be consistent with many other ml APIs. It is historical accident that the API does not support from and size and the team has been reluctant to implement paging because do not want to break the existing behaviour.

By setting a hypothetical size parameter to 0 the count field in the response would correctly give the number of matched jobs (like total hits in search) without sending the full config for all those jobs back. This is a clear use case and probably not the only one, for example get the count of jobs that match a wild card pattern or group.

elastic / elasticsearch

[ML] Enable performance tuning for `_ml/anomaly_detectors/<job_id>` #59211