elastic / ml-cpp

Machine learning C++ code
Other
150 stars 62 forks source link

[6.4.0] anomalies don't display in explorer #224

Open blookot opened 6 years ago

blookot commented 6 years ago

Running ML in 6.4.0. My jobs find tens of pages of anomalies but they don't display in the anomaly explorer... why is that?

image

Thanks in advance!

walterra commented 6 years ago

Some notes how to reproduce this with standard farequote data:

Looking at both jobs, the swimlane for the over-field job is empty, but relevant anomalies from that job show up in influencer list and anomaly table:

image

Looking at the over-field job only, the individual swimlanes show the expected anomalies:

image

Switching the "View By" option again to job ID the swimlane is again empty:

image

Done with recent Kibana master (2018-09-28) but an older Elasticsearch master (2018-07-25) without multi-bucket.

As a consequence of the empty swimlane, it's not possible to select any anomalies to view them in the Anomaly Explorer Charts.

tveasey commented 6 years ago

In the individual job view (for the farequote example) the overall lane is empty. This is what is displayed in the individual swim lanes when Job ID is selected.

We shouldn't be creating no results, but it is conceivable that they have zero score. We "derate" the probabilities of the overall (or bucket level) results to account for the fact that there are multiple population members interacting in the same bucket. Specifically, we ask what is the chance of seeing the minimum probability given we've observed a collection of n independent interactions, where n is the number of members of the population generating values in that bucket. If the individuals probabilities are all quite high, this process could in theory push the probability for the bucket over the threshold at which we give it a non-zero score. So the key ingredient for this to happen would be that no individual probability is low. Note that nothing in this process has changed in the 6.4 time frame.

I'll see if I can reproduce and check that aggregation is working as intended. At that point we can decide if we want to ensure we will get some non-zero scores at the bucket level if we have non-zero influencer or record level scores.

blookot commented 5 years ago

Hi @tveasey what did you decide eventually? thanks in advance for your update ;)

tveasey commented 5 years ago

Sorry @blookot this dropped off my TODO list.

The behaviour with farequote is expected: although the individual results are mildly anomalous we don't think they are unusual enough to display an anomaly at the job level. Such anomalies are intended to be a summary and boil many results down to just a handful (or potentially none as in this case if we don't think they are that unusual).

The issue we face is that if we were to colour by say max score of any individual result the job level swim lane would just saturate at red for a large job and we'd say all times are highly unusual. What we do is account for the chance of seeing each particular result given we are are observing lots of events (i.e. think of this like the chance of a particular person winning the lottery is small but the chance of someone winning the lottery is high if there are millions of players). I don't think we will change this behaviour.

I think the best solution might be on the UI side: we need something in the UI to say for the job this time bucket contains individual results, but we don't think that the time bucket itself is unusual. I'll discuss options for this with the UI team.

blookot commented 5 years ago

Hi @tveasey sorry i'm reading my issues with quite some delay... IMHO, I think the most natural is to use max display: have the max score of all anomalies in a swimlane span determine the color, as the max of all swimlines should be the color of the "overall" swimlane.

tveasey commented 5 years ago

IMHO, I think the most natural is to use max display: have the max score of all anomalies in a swimlane span determine the color, as the max of all swimlines should be the color of the "overall" swimlane.

The problem with this is it fails badly in many cases, i.e. in the extreme if you have a lot of series the swim lane just saturates at red. It also means that the results are not comparable with jobs with different cardinalities of partition. I think this sort of bias does need some sort of correction.

Going back to the lottery analogy, would you want a system which said "hey wow someone won the lottery this week", "hey hey someone one the lottery again this week!", etc, etc. We avoid this by accounting for the chance of a win (or individual anomaly) given the number of players (time series).

I agree the whole colouring scheme is complex and people find it confusing. Unfortunately, I've also come to the conclusion there aren't really silver bullets when trying to summarise lots of information (anomalies from many individual series) into one bitesize piece. You either run the risk of throwing out useful stuff or reporting way too much. We strive to make our assessment of individual events unusualness as accurate and unbiased as possible, so at least we order things correctly.

Another practical problem is this is really subjective. A user can't directly tell us how important each series is to them, but we often run into cases where they've partitioned by something (say a customer) and some customers are super important and others aren't. You can address this (with a certain amount of effort) with job configuration, i.e. use one job for each super important time series (KPI) and one job for the rest. In such cases, this catch all job can be purely for root cause, i.e. only look at it when one of the KPI jobs is anomalous. However, this would also be an area where I think some user feedback would be desirable.

One thought I had was the other way around, force down the importance of individual results if collectively they are not that unusual. This has the advantage that the colours are then more consistent. This also has its dangers though, i.e. one of those things might be really important to someone, and my experience is people care more about perceived false negative than false positives: they don't like it if something is coloured blue that is super important to them. This has discouraged me from making this change in the past.

It seems clear that we need some mechanism in the UI so that individual time series results aren't completely hidden by the summarisation process. We will try and make progress on this subject to other priorities.