elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.88k stars 24.73k forks source link

Support from and size in all ml Get X APIs #59405

Open davidkyle opened 4 years ago

davidkyle commented 4 years ago

The default and indeed only option for some of the ml GET APIs is to return all matching documents up to a limit of 10,000 hits. This is the case for Jobs and Datafeeds and if you have a large number of jobs this can generate a lot of traffic where you are only interested in the first N or if you just want a count of the matching jobs as described in #59211.

APIs that support paging

GET _transform GET _ml/data_frame/analytics

Those that don't

GET _ml/anomaly_detectors GET _ml/datafeeds

Including all the stats APIs GET _transform/_stats GET _ml/data_frame/analytics/_stats GET _ml/anomaly_detectors/_stats GET _ml/datafeeds/_stats

GET transform and DFA are both based on the very useful and well designed abstraction AbstractTransportGetResourcesAction which supports paging and wildcard expansion by implementing a small number of functions in the concrete class. If AbstractTransportGetResourcesAction could be used everywhere it would be very simple to add paging to the other APIs however, there are complications:

1. In 7.x Jobs and Datafeeds may be in the clusterstate Due to this paging becomes non-trivial and cannot be implemented via AbstractTransportGetResourcesAction.

2. Stats requests are routed to the host node for live tasks If a job or transform is running the stats request read from memory on the host node else stats are retrieved from the index. This is not an insurmountable problem but does complicated matters.

Paging with from and size is limited to the first 10,000 documents and is sub-optimal compared to search after but those parameters are consistent with other ml APIs and easy to use because a user does not have to track the last hit to send in the next search after request.

If this change is made the existing behaviour should be preserved meaning that rather oddly the defaults will be from: 0 and size: 10000

elasticmachine commented 4 years ago

Pinging @elastic/ml-core (:ml)

droberts195 commented 4 years ago

We discussed this in a team meeting and decided we should not make any changes in 7.x due to the extra complexity created by jobs and datafeeds potentially being in cluster state. In 8.0 it would be nice to make the anomaly detection job/datafeed APIs consistent with data frame analytics in terms of how they handle from and size.

However, we should also consider whether it's possible to make our GET endpoints more useful to the ML UI.

Currently the ML UI gets 10000 jobs/datafeeds and manages paging itself. Introducing paging in the same way as data frame analytics wouldn't particularly help, because for data frame analytics the UI gets 1000 jobs and manages paging itself.

Reasons why the UI currently manages paging itself are:

  1. It needs to join to the corresponding stats request, and that leads to the possibility of a race condition - if you ask for 100 jobs then 100 job stats and a job got deleted or created in between then you might only have 99 results that can be joined together.
  2. It does keyword search across many, but not all, fields - you can type something in the UI search box and see the jobs that have that word in one of a sensible set of fields.
  3. It needs to join to the space-aware saved object. Since not all jobs will be visible in all spaces soon, the UI would need to get more from the backend than it was being asked to display, then discard those where the saved object said the job should not be visible.

All three of these reasons apply equally to anomaly detection jobs and data frame analytics jobs.