elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.73k stars 8.14k forks source link

[ML] AIOps Log Rate Analysis: improve explanation of log rate spike/dip #182714

Closed alvarezmelissa87 closed 3 months ago

alvarezmelissa87 commented 4 months ago

Related meta issue: https://github.com/elastic/kibana/issues/181111 Item: Show both a baseline and deviation with doc count in the results table. It might be worth then not showing the raw doc count of the selection, but instead median doc count per histogram bucket to make the numbers comparable. Investigate if we can offer options to show both the raw full number of the selections and/or the median doc count per bucket.

Describe the feature:

One of the pain points of this view is that the pvalue isn't an intuitive value for most users. It would be useful to add a couple of columns to the table which display a more easily readable explanation of the change - e.g. 5x increase/decrease - for bucket and also for overall data.

To avoid having too many columns and making the table feel cluttered, we can reuse/use the Filter fields control to allow the user to hide/show particular columns. With this, we don't lose the pvalue column and we allow the user to view the columns that are most useful to them. We also retain the ability to sort by any of these values.

image

TASKS

elasticmachine commented 4 months ago

Pinging @elastic/ml-ui (:ml)

walterra commented 4 months ago

Screenshot of some experiments a while back for reference:

image

The tricky bit we need to consider:

At the moment the data in the results table doesn't know about the bucket size of the date histogram on top of it. So the overall baseline and deviation count can refer to different wide time ranges. This is a bit problematic for the explainability of the statements like 50x higher. To make it comparable we'd need to normalize the counts to be per date histogram bucket. In anomaly detection you're always comparing against the same bucket lengths.

Suggestion for a tooltip to explain how increase is calculated:

image

walterra commented 4 months ago

Here's a PoC that creates strings like 114x increase based on background and foreground counts: https://github.com/elastic/kibana/pull/179695/files#diff-a278bf75b860fa4b13bc67ce102e8045e52000d912e1386177590f35c4f0651cR273-R281

    logRateChange:
        bgCount > 0
          ? logRateType === 'spike'
            ? `${Math.round((docCount / bgCount) * 100) / 100}x increase`
            : `${Math.round((bgCount / docCount) * 100) / 100}x decrease`
          : logRateType === 'spike'
          ? `${docCount} docs up from 0 in baseline`
          : `0 docs down from ${docCount} in baseline`,

Note this is really just to get started, that code misses normalization to properly compare background/foreground.