elastic / opentelemetry-lib

Apache License 2.0
1 stars 8 forks source link

Add option to drop Otel native processed metrics #97

Closed rogercoll closed 1 month ago

rogercoll commented 1 month ago

The current remappers do not override the processed metrics, but they insert new metrics. In that sense, we end up having duplicated metric values but with different name. For example, k8s.pod.cpu_limit_utilization vs kubernetes.pod.cpu.usage.limit.pct.

This hasn’t been an issue so far because our primary focus has been on the ecs format, with metrics being sent by the Elasticsearch exporter configured in ecs mode. However, as we begin transitioning to the native Otel mode, we now face the challenge of having to support both metrics formats in Kibana:

The problem when not overriding the current metrics is that both metrics formats will be forwarded to the same elasticsearch exporter, and depending on its configuration they will be formatted: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/elasticsearchexporter#elasticsearch-document-mapping

Let's take this configuration:

exporters:
  elasticsearch:
    mode: ecs
pipelines:
  metrics:
    receivers: [kubectl]
    processors: [elasticinframetrics]
    exporters: [elasticsearch]

As the elasticsearch exporter is configured with ecs mode, all metrics (native and added ones) processed by the EIMP processor will be formatted.

It would be great to have an option in the remappers, so processors override the metrics instead of inserting them. This is the pipeline we have in mind:

elasticinframetric:
  override: true
exporters:
  elasticsearch/ecs:
    mode: ecs
  elasticsearch/otel:
    mode: otel
pipelines:
  metrics/ecs:
    receivers: [kubectl]
    processors: [elasticinframetrics]
    exporters: [elasticsearch/ecs]
  metrics/otel:
    receivers: [kubectl]
    processors: []
    exporters: [elasticsearch/otel]
ishleenk17 commented 1 month ago

My initial thought on this: EIMP is not meant to be used in OTEL mode of exporter. The whole point of this processor was to be used along with ecs mode of exporter to do the required translations.

Rest, I would go through the issue in details.

tetianakravchenko commented 1 month ago

@rogercoll I've tested use of the elasticinframetrics with the mode: otel:

    exporters:
        elasticsearch:
            mode: otel
    processors:
        elasticinframetrics:
          add_system_metrics: true
          add_k8s_metrics: true
    pipelines:
      metrics:
        receivers: [kubectl, hostmetrics]
        processors: [elasticinframetrics]
        exporters: [elasticsearch]

(also including the hostmetrics part as system metrics are also needed for the Inventory UI) Outcome:

  1. created datastreams include *.otel in the end: Image

-> Inventory UI does not work because of that, because there are expected datastreams name without .otel in the end: Image

  1. As an example I've checked kubernetes.pod.otel datastream - there are only transformed metrics, but with metrics.* prefix Image

  2. In logs of the collector there are lots of failed to index document for the otel k8s (and system) metrics:

    2024-09-19T15:23:39.791Z    error   elasticsearchexporter@v0.108.0/bulkindexer.go:332   failed to index document    {"kind": "exporter", "data_type": "metrics", "name": "elasticsearch", "index": "metrics-generic.otel-default", "error.type": "document_parsing_exception", "error.reason": "[1:164] Can't find dynamic template for dynamic template name [gauge_long] of field [metrics.k8s.container.restarts]"}
tetianakravchenko commented 1 month ago

Tested with config:

processors:
  elasticinframetrics:
     add_system_metrics: true
     add_k8s_metrics: true
exporters:
  elasticsearch/otel:
     mapping:
        mode: otel
  elasticsearch/ecs:
    mapping:
         mode: ecs
pipelines:
  metrics/ecs:
    receivers: [kubectl, hostmetrics]
    processors: [elasticinframetrics]
    exporters: [elasticsearch/ecs]
  metrics/otel:
    receivers: [kubectl, hostmetrics]
    processors: []
    exporters: [elasticsearch/otel]

Note: we need to split pipelines only for daemonset, for deployment we can use elasticsearch/otel only - https://github.com/rogercoll/opentelemetry/compare/add_onboarding_operator_values...tetianakravchenko:opentelemetry:split-otel-and-ecs-mode?expand=1

Outcome:

  1. It is needed to install assets for system and k8s integration - should be included in onboarding process

  2. Inventory page is relying on metrics stored in 9 datasteams: kubernetes.pod, system.process, system.network, system.filesystem, system.diskio, system.cpu, system.load, system.memory, system.process.summary and inventory page works: Image Image

  3. looking closed on kubernetes.pod: -> it includes only kubernetes.* metrics, and relevant metadata (like kubernetes.pod.name), doc sample:

{
  "_index": ".ds-metrics-kubernetes.pod-default-2024.09.24-000001",
  "_id": "5S-RrHwEkml4PxvZAAABkiOoBmQ",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2024-09-24T10:51:07.236Z",
    "data_stream": {
      "dataset": "kubernetes.pod",
      "namespace": "default",
      "type": "metrics"
    },
    "event": {
      "agent_id_status": "missing",
      "dataset": "kubernetes.pod",
      "ingested": "2024-09-24T10:51:15Z"
    },
    "host": {
      "architecture": "amd64",
      "cpu": {
        "cache": {
          "l2": {
            "size": 16384
          }
        },
        "family": "6",
        "model": {
          "id": "158",
          "name": "Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz"
        },
        "stepping": "13",
        "vendor": {
          "id": "GenuineIntel"
        }
      },
      "hostname": "kind-control-plane",
      "ip": [
        "10.244.0.1",
        "172.18.0.4",
        "172.21.0.2",
        "fc00:f853:ccd:e793::2",
        "fe80::42:acff:fe15:2"
      ],
      "mac": [
        "02-42-AC-12-00-04",
        "02-42-AC-15-00-02",
        "02-C2-11-D3-4E-B2",
        "0A-60-45-60-6D-9C",
        "6A-D0-26-B4-D0-DC",
        "D2-10-9B-E6-3C-08",
        "D6-10-C1-90-B9-53",
        "D6-69-21-E4-EF-11"
      ],
      "name": "kind-control-plane",
      "os": {
        "full": "Ubuntu 20.04.6 LTS (Focal Fossa) (Linux kind-control-plane 6.6.12-linuxkit #1 SMP PREEMPT_DYNAMIC Fri Jan 19 12:50:23 UTC 2024 x86_64)",
        "platform": "linux"
      }
    },
    "kubernetes": {
      "namespace": "kube-system",
      "pod": {
        "cpu": {
          "usage": {
            "limit": {
              "pct": 0
            },
            "node": {
              "pct": 0.002
            }
          }
        },
        "memory": {
          "usage": {
            "limit": {
              "pct": 0
            },
            "node": {
              "pct": 0.013
            }
          }
        },
        "name": "etcd-kind-control-plane",
        "network": {
          "rx": {
            "bytes": 991060529
          },
          "tx": {
            "bytes": 23413379
          }
        },
        "uid": "2772f6e21146f2e8a331b1cc7d319cf1"
      }
    },
    "otel_remapped": true,
    "service": {
      "type": "kubernetes"
    }
  },
  "fields": {
    "host.os.full.text": [
      "Ubuntu 20.04.6 LTS (Focal Fossa) (Linux kind-control-plane 6.6.12-linuxkit #1 SMP PREEMPT_DYNAMIC Fri Jan 19 12:50:23 UTC 2024 x86_64)"
    ],
    "host.os.full": [
      "Ubuntu 20.04.6 LTS (Focal Fossa) (Linux kind-control-plane 6.6.12-linuxkit #1 SMP PREEMPT_DYNAMIC Fri Jan 19 12:50:23 UTC 2024 x86_64)"
    ],
    "host.cpu.family": [
      "6"
    ],
    "kubernetes.pod.cpu.usage.limit.pct": [
      0
    ],
    "host.hostname": [
      "kind-control-plane"
    ],
    "kubernetes.pod.uid": [
      "2772f6e21146f2e8a331b1cc7d319cf1"
    ],
    "host.cpu.model.name": [
      "Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz"
    ],
    "host.mac": [
      "02-42-AC-12-00-04",
      "02-42-AC-15-00-02",
      "02-C2-11-D3-4E-B2",
      "0A-60-45-60-6D-9C",
      "6A-D0-26-B4-D0-DC",
      "D2-10-9B-E6-3C-08",
      "D6-10-C1-90-B9-53",
      "D6-69-21-E4-EF-11"
    ],
    "service.type": [
      "kubernetes"
    ],
    "host.ip": [
      "10.244.0.1",
      "172.18.0.4",
      "172.21.0.2",
      "fc00:f853:ccd:e793::2",
      "fe80::42:acff:fe15:2"
    ],
    "host.cpu.model.name.text": [
      "Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz"
    ],
    "kubernetes.namespace": [
      "kube-system"
    ],
    "kubernetes.pod.network.rx.bytes": [
      991060529
    ],
    "kubernetes.pod.network.tx.bytes": [
      23413379
    ],
    "kubernetes.pod.name": [
      "etcd-kind-control-plane"
    ],
    "host.name": [
      "kind-control-plane"
    ],
    "event.agent_id_status": [
      "missing"
    ],
    "host.cpu.model.id": [
      "158"
    ],
    "host.cpu.cache.l2.size": [
      16384
    ],
    "data_stream.namespace": [
      "default"
    ],
    "host.cpu.stepping": [
      "13"
    ],
    "kubernetes.pod.memory.usage.node.pct": [
      0.013
    ],
    "otel_remapped": [
      true
    ],
    "data_stream.type": [
      "metrics"
    ],
    "host.cpu.vendor.id": [
      "GenuineIntel"
    ],
    "host.architecture": [
      "amd64"
    ],
    "kubernetes.pod.cpu.usage.node.pct": [
      0.002
    ],
    "event.ingested": [
      "2024-09-24T10:51:15.000Z"
    ],
    "@timestamp": [
      "2024-09-24T10:51:07.236Z"
    ],
    "host.os.platform": [
      "linux"
    ],
    "data_stream.dataset": [
      "kubernetes.pod"
    ],
    "event.dataset": [
      "kubernetes.pod"
    ],
    "kubernetes.pod.memory.usage.limit.pct": [
      0
    ]
  }
}

  1. metrics coming from mode: ecs are stored in generic datastream -> includes only k8s.* metrics (not metrics.k8s.*) and transformed metadata: Image

  2. metrics coming from mode: otel are stored in generic.otel datastream. generic and generic.otel data is not overlapping.

cc @AlexanderWert

gizas commented 1 month ago

Just to double-down on previous comment, was testing in code today and I see that the remapper just creates another document with tranformed kubernetes.* metrics and nothing else (see test here where the k8s.pod.test wont be available in final document)

The generic datastream can be removed/ dropped ! It contains the kubelet related metrics that come from mode:ecs pipeline. Same copy of metrics is present in generic.otel

Note: The only available option to implement the drop of rest of metrics I think can be not to return the mb object here. The remapper would still have taken place in lines above.

rogercoll commented 1 month ago

we need to split pipelines only for daemonset

My main concern is regarding metrics duplication, at the moment if we configure the kubeletstats + elasticinframetrics we end up with the same metrics but with different names (no matter the elasticsearch exporter mode): k8s.* and kubernetes.*. These are the metrics that will be ingested with the following configuration:

pipelines:
  metrics/ecs:
    receivers: [kubectl, hostmetrics]
    processors: [elasticinframetrics] ---> `k8s.*`,  `kubernetes.*`, `system.*` and `system in ecs format`
    exporters: [elasticsearch/ecs]
  metrics/otel:
    receivers: [kubectl, hostmetrics]
      processors: [] ---> `k8s.*`,  `system.*`
    exporters: [elasticsearch/otel]

Note that the k8s.* and the system.* metrics will be duplicated but exported with different modes. @tetianakravchenko @gizas is this the expected behavior? Which metrics do we need for the inventory?

If we only need ECS metrics, I think the elasticinframetrics should drop the otel metrics and just produce the ECS ones:

pipelines:
  metrics/ecs:
    receivers: [kubectl, hostmetrics]
    processors: [elasticinframetrics] --->   `kubernetes.*` and `system in ecs format`
    exporters: [elasticsearch/ecs]
  metrics/otel:
    receivers: [kubectl, hostmetrics]
      processors: [] ---> `k8s.*`,  `system.*`
    exporters: [elasticsearch/otel]
AlexanderWert commented 1 month ago

@rogercoll Just one minor correction: Since we don't have any OTel-data native system assets, yet. We don't need to include the hostmetrics receiver in the metrics/otel pipeline, right?

gizas commented 1 month ago

Note that the k8s. and the system. metrics will be duplicated but exported with different modes. @tetianakravchenko @gizas is this the expected behavior? Which metrics do we need for the inventory?

The elasticinframetrics processor will do a remapping and will create kubernetes.pod, system.process, system.network, system.filesystem, system.diskio, system.cpu, system.load, system.memory, system.process.summary. Those are additonal datastreams that the inventory relies on. Tania explains this here

If we only need ECS metrics, I think the elasticinframetrics should drop the otel metrics and just produce the ECS ones:

To be more precise on this, the processor also keeps the k8s. metrics and adds the new one. So it needs not to add the k8s. metrics I am trying to build my image locally to test the "dropping" (as per note here https://github.com/elastic/opentelemetry-lib/issues/97#issuecomment-2371193078)

ishleenk17 commented 1 month ago

Is the main focus of this PR to drop the OTEL native Metrics with override ?

AlexanderWert commented 1 month ago

Is the main focus of this PR to drop the OTEL native Metrics with override ?

yes