elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.47k stars 8.04k forks source link

[ML] Solutions: Request for mock ML job data for testing #76660

Open alvarezmelissa87 opened 3 years ago

alvarezmelissa87 commented 3 years ago

Request for mock ML job data for use in plugins integrating with ML.

As we work toward being more Solutions-oriented, more plugins are integrating with ML. It would be great to provide mock data for plugins to use when they create their own functional tests - there is no need for them to have to actually go through the whole ML job flow as we cover that in the ML plugin tests.

cc @pheyos

elasticmachine commented 3 years ago

Pinging @elastic/ml-ui (:ml)

jasonrhodes commented 3 years ago

Hello! When we want to test anomalies and other ML integrations in Observability, we often just want to make sure our graphs are displaying the right way based on expected data. To that end, it'd be great to have a way to access fixtures of well-tested, maintained API response data for various endpoints that we as solutions interact with, so we can use those instead of setting up real jobs, waiting for them to produce results, etc.

In addition, it would also be nice to have document templates of some kind, or some other way to write data into the right indices so that these API endpoints produce the desired results, for times when we want to set up a cluster that is already in a given state rather than hoping and/or waiting for ML jobs to produce that state reliably.

I think these are two different use cases and I'm happy to provide more information on both! Thanks!!

jasonrhodes commented 3 years ago

Update: it sounds like since there doesn't appear to be an API for retrieving anomalies, the mock data is probably not going to be helpful for our anomaly cases. We appear to be using the mlAnomalySearch method which looks like it is mostly an ES client so we are probably going to need document templates more so that we can fill ES with anomaly data.

weltenwort commented 3 years ago

For added context, we are already able to inject anomaly data by manually loading documents into the results index. It would be nice, though, to somehow generate them such that they are guaranteed to be consistent.

An even bigger problem is to get the jobs themselves into the desired state for testing. This includes, but is not limited to

jgowdyelastic commented 3 years ago

cc @pheyos

sophiec20 commented 3 years ago

it would be nice, though, to somehow generate them such that they are guaranteed to be consistent.

For any ML mock data exercise to have a chance of being effective, we would need mock data coming out of the agent(s) so we can close the circle. Do we know if this is available?

weltenwort commented 3 years ago

In the case of logs we can generate documents or load them from a fixture.

pheyos commented 3 years ago

Thanks for the feedback! We had a few discussion around this topic and here's a quick summary:

jasonrhodes commented 3 years ago
  • We don't want to mock API responses in end-to-end tests for a similar reason: the real API response could change and the tests wouldn't catch that as it's still running with the old mock data.

I understand this concern, but I wonder if it may be time for us to consider our Kibana APIs to be more than hidden implementation details and treat them as exposed APIs that are subject to some amount of backwards-compatible version control? I know that in Logs and Metrics, we will need to run tests that don't rely on running ML jobs. The idea with this ticket, from our end, was to avoid this very problem of stale data because the mocks are controlled by the ML team directly, and updated as part of the overall process.