[ML] AIOps: Add API for log rate analysis for stack / solutions usage

walterra commented 8 months ago

The Kibana API for Log Rate Analysis is internal and was treated as an implementation detail for the UI so far. There are several use cases where it would be useful to have a more generic API for usage by other parts of the platform and solutions:

Augment existing date histogram charts like in Discover or Logs Explorer with Log Rate Analysis
Enable AI Assistant to call log rate analysis as a registered function. Note for the AI Assistant we don't need a REST API, but the registered function can reuse the same logic exposed by that API. (https://github.com/elastic/kibana/issues/178501)
Integrate log rate analysis into alerting

The existing internal API is also a bit complex as it streams custom NDJSON and includes data like detailed histogram data to populate the results that might not be necessary for the above use cases.

The public API could look like this:

{
    index: string;
    start: number; // histogram start timestamp
    end: number; // histogram end timestamp
    interval?: number; // bucket interval for the histogram
    fields?: string[]; // Override to provide index fields up front and avoid auto detection
    query?: object; // Optional DSL query
    groupResults = false; // Whether to run additional analysis that groups co-occurring field/values.
    randomSamplingProbability = 1; // Optional random sampler probability
 ...
}

The API would then internally run:

Run change point detection on the provided time range
Identify baseline and deviation time ranges based on the identified change point
If no fields are provided, auto-identify suitable fields for analysis
Run significant terms on the fields with the option p-value score.
Optionally group co-occurring field/values.

The API would not support streaming but would return results as a single JSON object.

The existing UI and internal API is GA so it's not a good candidate for PoCs and experimentation. This additional public API could be tagged experimental until the first use cases are picking it up and are more fleshed out.

### Tasks
- [x] https://github.com/elastic/kibana/pull/178338
- [x] https://github.com/elastic/kibana/pull/179178 
- [x] https://github.com/elastic/kibana/pull/178756
- [x] https://github.com/elastic/kibana/pull/187669
- [ ] Create plugin for solution integration `plugins/aiops-api` https://github.com/elastic/kibana/pull/179695

elasticmachine commented 8 months ago

Pinging @elastic/ml-ui (:ml)

walterra commented 7 months ago

Thoughts on a future alerts integration: It is likely too expensive to run fields caps on every alert run. What we could do instead would be something similar to how the anomaly alert caches field formats.

Identifying fields for analysis is not necessary on every alert run as long as the underlying index structure doesn't change. So caching those fields would greatly improve runtime of the analysis. This could even be part of the setup process of the alert: We'd run field identification once and the user decides which fields they'd like to use for the alert. This would mean for an alert check we'd just have to run significant terms and that's it (without grouping).

sorenlouv commented 6 months ago

So caching those fields would greatly improve runtime of the analysis.

This sounds like a great idea. There's no need to do this for every alert execution.

walterra commented 4 months ago

I'll remove the version label from this one. Things we plan to implement for a specific version should be mentioned in this version specific meta issue: #187684.

This issue will still serve its purpose to track the overall progress on how to enable log rate analysis in other solution features.

elastic / kibana

[ML] AIOps: Add API for log rate analysis for stack / solutions usage #178613