elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
980 stars 24.82k forks source link

Rollup aggregation of _size field #96979

Open salvatore-campagna opened 1 year ago

salvatore-campagna commented 1 year ago

Elasticsearch Version

8.8.0

Installed Plugins

No response

Java Version

bundled

OS Version

All

Problem Description

Aggregation on size field done by a Rollup job fails with the following error:

[es/i-1/es.log] [2023-06-20T14:50:00.062Z][WARN][org.elasticsearch.xpack.core.indexing.AsyncTwoPhaseIndexer] [instance-0000000001] Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [rolluptest_rollup], id [rolluptest$XKr_GEPutrdX778J5QWByg], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$XKr_GEPutrdX778J5QWByg'. Preview of field's value: '{sum={value=109.0}}']
[1]: index [rolluptest_rollup], id [rolluptest$AmH9kDfoBhyB01E2E7JGSw], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$AmH9kDfoBhyB01E2E7JGSw'. Preview of field's value: '{sum={value=115.0}}']

Indexing fails because, after a bucket is aggregated the Rollup job tried to write a field whose name is _size and whose value is the aggregation over time (sum) of all the documents _size values for a specific time bucket. This is not possible, unfortunately, because _size is a special meta field used to indicate the size of the _source fo a document. It comes from the MapperSize plugin.

A workaround exists which takes advantage of runtime fields. We can read the _size field using a runtime field script and later on aggregate on the runtime field writing a field whose name is not _size, but some other name we are allowed to use.

Example

PUT rolluptest
{
  "mappings": {
    "runtime": {
      "size": {
        "type": "long",
        "script": {
          "source": "emit(doc['_size'].value)"
        }
      }
    },
    "_size": {
      "enabled": true
    }
  }
}

PUT _rollup/job/rolluptest
{
  "index_pattern": "rolluptest",
  "rollup_index": "rolluptest_rollup",
  "cron": "*/30 * * * * ?",
  "page_size": 1000,
  "groups": { 
    "date_histogram": {
      "field": "@timestamp",
      "fixed_interval": "1h",
      "delay": "7d"
    },
    "terms": {
      "fields": [ "name.keyword" ]
    }
  },
  "metrics": [ 
      {
      "field": "size",
      "metrics": [ "sum" ]
    }
  ]
}

Steps to Reproduce

Step 1

PUT rolluptest/_doc/1
{
  "text": "This is a document",
  "@timestamp": "2023-05-12T00:00:00Z",
  "number": 10,
  "name": "app1"
}

PUT rolluptest/_doc/2
{
  "text": "This is another document",
  "@timestamp": "2023-05-13T00:00:00Z",
  "number": 20,
  "name": "app2"
}

Step 2

PUT _rollup/job/rolluptest
{
  "index_pattern": "rolluptest",
  "rollup_index": "rolluptest_rollup",
  "cron": "*/30 * * * * ?",
  "page_size": 1000,
  "groups": { 
    "date_histogram": {
      "field": "@timestamp",
      "fixed_interval": "1h",
      "delay": "7d"
    },
    "terms": {
      "fields": [ "name.keyword" ]
    }
  },
  "metrics": [ 
      {
      "field": "_size",
      "metrics": [ "sum" ]
    }
  ]
}

Logs (if relevant)

[es/i-1/es.log] [2023-06-20T14:50:00.062Z][WARN][org.elasticsearch.xpack.core.indexing.AsyncTwoPhaseIndexer] [instance-0000000001] Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [rolluptest_rollup], id [rolluptest$XKr_GEPutrdX778J5QWByg], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$XKr_GEPutrdX778J5QWByg'. Preview of field's value: '{sum={value=109.0}}']
[1]: index [rolluptest_rollup], id [rolluptest$AmH9kDfoBhyB01E2E7JGSw], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$AmH9kDfoBhyB01E2E7JGSw'. Preview of field's value: '{sum={value=115.0}}']
elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

salvatore-campagna commented 1 year ago

Here the idea is to update documentation explaining the issue, that we do not support rollup on _size field, adding some YAML test documenting the behaviour and explaining how to workaround it using runtime fields.

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)