elastic / xk6-output-elasticsearch

Apache License 2.0
17 stars 10 forks source link

Use datastreams to store metrics #32

Open danielmitterdorfer opened 5 months ago

danielmitterdorfer commented 5 months ago

So far we have used a single index to store k6 metrics. However, datastreams are preferable for a couple of reasons, one of them being able define a retention period via an ILM policy. We should therefore move away from the single index and instead create a datastream. Storing data in an index will not be supported anymore.

Datastream Details

These calls will be issued internally:

PUT /_ilm/policy/metrics-k6
{
  "phases": {
    "hot": {
      "actions": {
        "rollover": {
          "max_primary_shard_size": "50gb",
          "min_docs": 1
        },
        "set_priority": {
          "priority": 100
        },
        "readonly": {}
      }
    }
  },
  "_meta": {
    "description": "default policy for k6 metrics",
    "managed": true,
    "version": 1
  }
}
PUT /_component_template/metrics-k6
{
  "template": {
    "settings": {
      "index": {
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "auto_expand_replicas": "0-1"
      },
      "codec": "best_compression"
    },
    "mappings": {
      "_meta": {
        "index-template-version": 1,
        "managed": true
      },
      "date_detection": false,
      "dynamic_templates": [
        {
          "strings": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ],
      "_source": {
        "enabled": true
      },
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "Value": {
          "type": "double"
        }
      },
      "version": 1
    }
  }
}

Note: previously the timestamp field was called Time. We can rename this in a reindex script.

PUT /_component_template/metrics-k6-ilm
{
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "metrics-k6"
        }
      }
    }
  },
  "_meta": {
    "index-template-version": 1,
    "managed": true
  },
  "version": 1
}
PUT /_index_template/metrics-k6
{
  "index_patterns": [
    "metrics-k6-*"
  ],
  "data_stream": {},
  "composed_of": [
    "metrics-k6",
    "metrics-k6-ilm",
    "metrics-k6-ilm@custom"
  ],
  "ignore_missing_component_templates": [
    "metrics-k6-ilm@custom"
  ],
  "priority": 100,
  "_meta": {
    "description": "index template for k6 metrics",
    "managed": true
  },
  "version": 1
}

Behavior for existing installations

When the k6-metrics index exists, we can issue a warning that the index pattern has changed. This is only best effort and won't catch cases where users have overridden the index name though.

Migration

We won't automatically migrate data but can provide a reindex and cleanup script that users can execute if required.

Permissions

We might need to adapt the initial permission check as the output extension needs to create a datastream and associated ILM policy. Finally, we should allow to make this process optional as advanced users might want to create the datastream themselves and tighten the cluster permissions of the k6 user to allow only write access. This behavior will be controlled by the flag K6_ELASTICSEARCH_AUTOCREATE_DATASTREAM which is true by default. If it set to false, the output extension assumes that the datastream is already setup properly (without any further checks).

VCCPlindsten commented 5 months ago

Another potential step to take is to go to Time Series Data Streams, for reduction in stored size. This limits support for pre 8.7 (was in technical preview from 8.5) however.

In some ways, the index template already being used is friendly to this - keyword for all strings as would be for a dimension. Though there are limits of this (max 21 dimensions). TSDS also wants to know if your metric is a gauge or counter, which could be generalized with value.gauge and value.counter mappings. But would require some routing logic when emitting the events.