elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.71k stars 8.13k forks source link

[ML] Cloned multi metric jobs have different swimlane colours #18129

Open elasticmachine opened 6 years ago

elasticmachine commented 6 years ago

Original comment by @pheyos:

Versions:

Browser: Firefox 58.0

Steps to reproduce:

Additional information:

{
  "job_id": "cw_multi_1",
  "job_type": "anomaly_detector",
  "job_version": "6.1.3",
  "groups": [
    "manual_ui_tests"
  ],
  "description": "cw multi 1",
  "create_time": 1516803906376,
  "finished_time": 1516805201848,
  "established_model_memory": 3307064,
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "detector_description": "mean(CPUUtilization)",
        "function": "mean",
        "field_name": "CPUUtilization",
        "partition_field_name": "instance",
        "detector_rules": [

        ],
        "detector_index": 0
      }
    ],
    "influencers": [
      "instance"
    ]
  },
  "analysis_limits": {
    "model_memory_limit": "14mb"
  },
  "data_description": {
    "time_field": EMAIL REDACTED
    "time_format": "epoch_ms"
  },
  "model_snapshot_retention_days": 1,
  "model_snapshot_id": "1516803949",
  "results_index_name": "shared",
  "data_counts": {
    "job_id": "cw_multi_1",
    "processed_record_count": 1793481,
    "processed_field_count": 2050056,
    "input_bytes": 100928963,
    "input_field_count": 2050056,
    "invalid_date_count": 0,
    "missing_field_count": 1536906,
    "out_of_order_timestamp_count": 0,
    "empty_bucket_count": 0,
    "sparse_bucket_count": 0,
    "bucket_count": 1398,
    "earliest_record_timestamp": 1477612800000,
    "latest_record_timestamp": 1478871060000,
    "last_data_time": 1516803948950,
    "input_record_count": 1793481
  },
  "model_size_stats": {
    "job_id": "cw_multi_1",
    "result_type": "model_size_stats",
    "model_bytes": 3307064,
    "total_by_field_count": 79,
    "total_over_field_count": 0,
    "total_partition_field_count": 78,
    "bucket_allocation_failures_count": 0,
    "memory_status": "ok",
    "log_time": 1516805201000,
    "timestamp": 1478870100000
  },
  "datafeed_config": {
    "datafeed_id": "datafeed-cw_multi_1",
    "job_id": "cw_multi_1",
    "query_delay": "65630ms",
    "indices": [
      "cloudwatch*"
    ],
    "types": [

    ],
    "query": {
      "match_all": {
        "boost": 1
      }
    },
    "scroll_size": 1000,
    "chunking_config": {
      "mode": "auto"
    },
    "state": "stopped"
  },
  "state": "closed"
}
elasticmachine commented 6 years ago

Original comment by @peteharverson:

I can reproduce this on a 7.0.0 snapshot on clones of cloudwatch jobs using the same job configuration as above. The result_type influencer docs look identical between the two cloned jobs (looking at times and influencer_score values) and yet running the aggregation used by the 'view by' swimlane in the Kibana console returns different results for some instances between the two jobs.

Aggregation run by the 'view by' swimlane is of the form:

   "aggs":{
      "influencerFieldValues":{
         "terms":{
            "field":"influencer_field_value",
            "size":10,
            "order":{
               "maxAnomalyScore":"desc"
            }
         },
         "aggs":{
            "maxAnomalyScore":{
               "max":{
                  "field":"influencer_score"
               }
            },
            "byTime":{
               "date_histogram":{
                  "field":"timestamp",
                  "interval":"28800s",
                  "min_doc_count":1
               },
               "aggs":{
                  "maxAnomalyScore":{
                     "max":{
                        "field":"influencer_score"
                     }
                  }
               }
            }
         }
      }
   }
elasticmachine commented 6 years ago

Original comment by @dimitris-athanasiou:

This is a consequence of the way sorting terms aggregations work. They are inaccurate as only the top-x docs of each shard are considered. Then any nested aggs operate on only the subset of docs that was returned from the terms/order agg. This is by design from the elasticsearch side.

A way to improve the stability of the results is to split this into two separate requests. The first request will simply find the top-10 terms over all time. The second request will filter on the top-10 terms and then simply find the max score per time bucket. This way, the second query will correctly operate on all docs for the top-10 terms. However, comparison between different jobs might still vary as the first request could return different top-10 terms between jobs.

pheyos commented 6 years ago

This does not only happen on job cloning but also when changing the limit in the Anomaly Explorer for one job (screenshots taken on 6.4.0-BC4, with .ml-anomalies-shared having 5 primary / 0 replica shards): anomaly_explorer_different_colors_based_on_limit