elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.49k stars 8.05k forks source link

[ML] AIOps: Add API for log rate analysis for stack / solutions usage #178613

Open walterra opened 4 months ago

walterra commented 4 months ago

The Kibana API for Log Rate Analysis is internal and was treated as an implementation detail for the UI so far. There are several use cases where it would be useful to have a more generic API for usage by other parts of the platform and solutions:

The existing internal API is also a bit complex as it streams custom NDJSON and includes data like detailed histogram data to populate the results that might not be necessary for the above use cases.

The public API could look like this:

{
    index: string;
    start: number; // histogram start timestamp
    end: number; // histogram end timestamp
    interval?: number; // bucket interval for the histogram
    fields?: string[]; // Override to provide index fields up front and avoid auto detection
    query?: object; // Optional DSL query
    groupResults = false; // Whether to run additional analysis that groups co-occurring field/values.
    randomSamplingProbability = 1; // Optional random sampler probability
 ...
}

The API would then internally run:

The API would not support streaming but would return results as a single JSON object.

The existing UI and internal API is GA so it's not a good candidate for PoCs and experimentation. This additional public API could be tagged experimental until the first use cases are picking it up and are more fleshed out.

### Tasks
- [x] https://github.com/elastic/kibana/pull/178338
- [x] https://github.com/elastic/kibana/pull/179178 
- [x] https://github.com/elastic/kibana/pull/178756
- [x] https://github.com/elastic/kibana/pull/187669
- [ ] Create plugin for solution integration `plugins/aiops-api` https://github.com/elastic/kibana/pull/179695
elasticmachine commented 4 months ago

Pinging @elastic/ml-ui (:ml)

walterra commented 3 months ago

Thoughts on a future alerts integration: It is likely too expensive to run fields caps on every alert run. What we could do instead would be something similar to how the anomaly alert caches field formats.

Identifying fields for analysis is not necessary on every alert run as long as the underlying index structure doesn't change. So caching those fields would greatly improve runtime of the analysis. This could even be part of the setup process of the alert: We'd run field identification once and the user decides which fields they'd like to use for the alert. This would mean for an alert check we'd just have to run significant terms and that's it (without grouping).

sorenlouv commented 2 months ago

So caching those fields would greatly improve runtime of the analysis.

This sounds like a great idea. There's no need to do this for every alert execution.

walterra commented 6 days ago

I'll remove the version label from this one. Things we plan to implement for a specific version should be mentioned in this version specific meta issue: #187684.

This issue will still serve its purpose to track the overall progress on how to enable log rate analysis in other solution features.