unexpected results from aggregartions

mruediger commented 9 years ago

Hi. I am trying to use elasticsearch to store metrics of our cluster. The collection is done using sensu and the values in elasticsearch still have the fields "key","value" and "timestamp" that you would see in graphite. When I create a average/max/min histogram over our metrics I get weird results. The average cpu utilisation of our cluster is 1.22907852E-315 according to elasticsearch. At first glance I suspected a overflow issue, but there are simply not enough results to cause that.

The json I am using is the following:

{
  "size": 5,
  "query": {
    "bool": {
      "must": [ { "match": { "host": "poolnode-03" } },
                { "match": { "metric": "cpu_metrics" } },
                { "range": {
                    "@timestamp": {
                       "gte" : "2014-11-12T17:50:00",
                       "lte" : "2014-11-12T17:51:00"
                    }
                  }
                }]
    }     
  },        
  "aggs": {
    "cpu_histogram": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "5000ms",
        "min_doc_count": 1
      },
      "aggs": {
        "avg_cpu": { "avg": { "field": "value" } },
        "max_cpu": { "max": { "field": "value" } },
        "min_cpu": { "min": { "field": "value" } }
      }
    }
  }
}

The result looks like this:

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 408,
    "max_score" : 3.4773107,
    "hits" : [ {
      "_index" : "sensu",
      "_type" : "cpu_metrics",
      "_id" : "AUmlIK4cRrcdtiDjFlbQ",
      "_score" : 3.4773107,
      "_source":{"host":"poolnode-03","metric":"cpu_metrics","name":"poolnode-03.cpu.total.iowait","value":11637,"@timestamp":"2014-11-12T17:50:22+00:00"}
    }, {
      "_index" : "sensu",
      "_type" : "cpu_metrics",
      "_id" : "AUmlIK4cRrcdtiDjFlbV",
      "_score" : 3.4773107,
      "_source":{"host":"poolnode-03","metric":"cpu_metrics","name":"poolnode-03.cpu.cpu0.user","value":15686755,"@timestamp":"2014-11-12T17:50:22+00:00"}
    }, {
      "_index" : "sensu",
      "_type" : "cpu_metrics",
      "_id" : "AUmlIK4cRrcdtiDjFlbe",
      "_score" : 3.4773107,
      "_source":{"host":"poolnode-03","metric":"cpu_metrics","name":"poolnode-03.cpu.cpu1.user","value":12541923,"@timestamp":"2014-11-12T17:50:22+00:00"}
    }, {
      "_index" : "sensu",
      "_type" : "cpu_metrics",
      "_id" : "AUmlIK4cRrcdtiDjFlbj",
      "_score" : 3.4773107,
      "_source":{"host":"poolnode-03","metric":"cpu_metrics","name":"poolnode-03.cpu.cpu1.irq","value":3,"@timestamp":"2014-11-12T17:50:22+00:00"}
    }, {
      "_index" : "sensu",
      "_type" : "cpu_metrics",
      "_id" : "AUmlIK4cRrcdtiDjFlbo",
      "_score" : 3.4773107,
      "_source":{"host":"poolnode-03","metric":"cpu_metrics","name":"poolnode-03.cpu.ctxt","value":4585398711,"@timestamp":"2014-11-12T17:50:22+00:00"}
    } ]
  },
  "aggregations" : {
    "cpu_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2014-11-12T17:50:00.000Z",
        "key" : 1415814600000,
        "doc_count" : 68,
        "min_cpu" : {
          "value" : 0.0
        },
        "avg_cpu" : {
          "value" : 1.22906508E-315
        },
        "max_cpu" : {
          "value" : 2.2995982737E-314
        }
      }, {
        "key_as_string" : "2014-11-12T17:50:10.000Z",
        "key" : 1415814610000,
        "doc_count" : 68,
        "min_cpu" : {
          "value" : 0.0
        },
        "avg_cpu" : {
          "value" : 1.229071915E-315
        },
        "max_cpu" : {
          "value" : 2.2996008E-314
        }
      }, {
        "key_as_string" : "2014-11-12T17:50:20.000Z",
        "key" : 1415814620000,
        "doc_count" : 68,
        "min_cpu" : {
          "value" : 0.0
        },
        "avg_cpu" : {
          "value" : 1.229073614E-315
        },
        "max_cpu" : {
          "value" : 2.29960331E-314
        }
      }, {
        "key_as_string" : "2014-11-12T17:50:30.000Z",
        "key" : 1415814630000,
        "doc_count" : 68,
        "min_cpu" : {
          "value" : 0.0
        },
        "avg_cpu" : {
          "value" : 1.22907528E-315
        },
        "max_cpu" : {
          "value" : 2.2996057756E-314
        }
      }, {
        "key_as_string" : "2014-11-12T17:50:40.000Z",
        "key" : 1415814640000,
        "doc_count" : 68,
        "min_cpu" : {
          "value" : 0.0
        },
        "avg_cpu" : {
          "value" : 1.229076895E-315
        },
        "max_cpu" : {
          "value" : 2.2996081945E-314
        }
      }, {
        "key_as_string" : "2014-11-12T17:50:50.000Z",
        "key" : 1415814650000,
        "doc_count" : 68,
        "min_cpu" : {
          "value" : 0.0
        },
        "avg_cpu" : {
          "value" : 1.22907852E-315
        },
        "max_cpu" : {
          "value" : 2.2996107306E-314
        }
      } ]
    }
  }
}

I checked the mapping of the type and everything looks good:

{
  "sensu" : {
    "mappings" : {
      "cpu_metrics" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "host" : {
            "type" : "string"
          },
          "metric" : {
            "type" : "string"
          },
          "name" : {
            "type" : "string"
          },
          "value" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

I suspect I misunderstood some of the syntax and the error is on my side, but after studying the documentation for quite a while I still cannot figure out what is wrong with my search.

Thanks, Mathias

clintongormley commented 9 years ago

Hi @mruediger

Did you map the the value field explicitly, or just rely on dynamic mapping? Currently there is an issue where a new field can be added to two shards at the same time, with different mappings (eg double vs long). One mapping will win, but the other mapping continues to exist on the other shard.

I think this is what you're running into. When you run the aggregations, some shards are returning longs while other shards are returning doubles, which is causing this mixup.

We plan on fixing this mapping issue, but in the meantime you can handle this by explicitly specifying the field type.

mruediger commented 9 years ago

Hey @clintongormley , thanks for the quick reply.

I suspected something like this and ran curl -XPUT http://localhost:9200/sensu/_mapping/cpu_metrics/ -d '{ "cpu_metrics" : { "properties" : { "value" : { "type" : "long" }}}}'

I don't now if that is enough to configure the mapping. Is it even possible to set the mapping for data that is already stored? Sorry, I am pretty new to elasticsearch.

-Mathias

clintongormley commented 9 years ago

@mruediger no worries :)

It isn't sufficient to fix existing mappings. You will need to reindex.

btw, a colleague has just clarified my description of the problem, as it is not quite as clear cut as a described. In your case the long mapping won. The problem shows up when the primary or replica shard with the internal double mapping is moved to another node, which then interprets the existing data as long, when really it has been indexed as a double.

Reindexing is the only way forward here I'm afraid. Perhaps try a subset of the data first, to confirm that it is the issue.

mruediger commented 9 years ago

I will try that. Thanks!

clintongormley commented 9 years ago

please let us know if that works - if it doesn't then there is another issue

mruediger commented 9 years ago

It seems to be working now. Thanks :-)

mruediger commented 9 years ago

Its normal that the min/max and avg results are returned as a floating point number (e.g. 4.654463936E9)?

Mathias

clintongormley commented 9 years ago

Hi @mruediger

Yes, these aggs always return doubles.

elastic / elasticsearch

unexpected results from aggregartions #8485