Open elasticmachine opened 6 years ago
Original comment by @peteharverson:
I can reproduce this on a 7.0.0 snapshot on clones of cloudwatch jobs using the same job configuration as above. The result_type
influencer docs look identical between the two cloned jobs (looking at times and influencer_score values) and yet running the aggregation used by the 'view by' swimlane in the Kibana console returns different results for some instances between the two jobs.
Aggregation run by the 'view by' swimlane is of the form:
"aggs":{
"influencerFieldValues":{
"terms":{
"field":"influencer_field_value",
"size":10,
"order":{
"maxAnomalyScore":"desc"
}
},
"aggs":{
"maxAnomalyScore":{
"max":{
"field":"influencer_score"
}
},
"byTime":{
"date_histogram":{
"field":"timestamp",
"interval":"28800s",
"min_doc_count":1
},
"aggs":{
"maxAnomalyScore":{
"max":{
"field":"influencer_score"
}
}
}
}
}
}
}
Original comment by @dimitris-athanasiou:
This is a consequence of the way sorting terms aggregations work. They are inaccurate as only the top-x docs of each shard are considered. Then any nested aggs operate on only the subset of docs that was returned from the terms/order agg. This is by design from the elasticsearch side.
A way to improve the stability of the results is to split this into two separate requests. The first request will simply find the top-10 terms over all time. The second request will filter on the top-10 terms and then simply find the max score per time bucket. This way, the second query will correctly operate on all docs for the top-10 terms. However, comparison between different jobs might still vary as the first request could return different top-10 terms between jobs.
This does not only happen on job cloning but also when changing the limit
in the Anomaly Explorer for one job (screenshots taken on 6.4.0-BC4, with .ml-anomalies-shared
having 5 primary / 0 replica shards):
Original comment by @pheyos:
Versions:
Browser: Firefox 58.0
Steps to reproduce:
Additional information: