Closed alvarezmelissa87 closed 3 months ago
Pinging @elastic/ml-ui (:ml)
Screenshot of some experiments a while back for reference:
The tricky bit we need to consider:
At the moment the data in the results table doesn't know about the bucket size of the date histogram on top of it. So the overall baseline and deviation count can refer to different wide time ranges. This is a bit problematic for the explainability of the statements like 50x higher
. To make it comparable we'd need to normalize the counts to be per date histogram bucket. In anomaly detection you're always comparing against the same bucket lengths.
Suggestion for a tooltip to explain how increase is calculated:
Here's a PoC that creates strings like 114x increase
based on background and foreground counts: https://github.com/elastic/kibana/pull/179695/files#diff-a278bf75b860fa4b13bc67ce102e8045e52000d912e1386177590f35c4f0651cR273-R281
logRateChange:
bgCount > 0
? logRateType === 'spike'
? `${Math.round((docCount / bgCount) * 100) / 100}x increase`
: `${Math.round((bgCount / docCount) * 100) / 100}x decrease`
: logRateType === 'spike'
? `${docCount} docs up from 0 in baseline`
: `0 docs down from ${docCount} in baseline`,
Note this is really just to get started, that code misses normalization to properly compare background/foreground.
Related meta issue: https://github.com/elastic/kibana/issues/181111 Item:
Show both a baseline and deviation with doc count in the results table. It might be worth then not showing the raw doc count of the selection, but instead median doc count per histogram bucket to make the numbers comparable. Investigate if we can offer options to show both the raw full number of the selections and/or the median doc count per bucket.
Describe the feature:
One of the pain points of this view is that the
pvalue
isn't an intuitive value for most users. It would be useful to add a couple of columns to the table which display a more easily readable explanation of the change - e.g.5x increase/decrease
- for bucket and also for overall data.To avoid having too many columns and making the table feel cluttered, we can reuse/use the
Filter fields
control to allow the user to hide/show particular columns. With this, we don't lose thepvalue
column and we allow the user to view the columns that are most useful to them. We also retain the ability to sort by any of these values.TASKS