elastic / opentelemetry-lib

Apache License 2.0
0 stars 6 forks source link

Using `event.module` as the field to taint translated metrics leads to issues due to `constant_keyword` field mappings #29

Closed felixbarny closed 1 week ago

felixbarny commented 1 week ago

In https://github.com/elastic/opentelemetry-lib/pull/18, we've added a way to differentiate translated metrics by setting event.module to elastic/opentelemetry-lib:

https://github.com/elastic/opentelemetry-lib/blob/c8fcaba71b9d57d21d8c0926ae100e422693429e/remappers/internal/metric.go#L71

@andrzej-stencel found out that this leads to issues when the system integration is installed.

The error message is:

2024-06-20T16:41:59.387+0200    error   elasticsearchexporter@v0.102.0/elasticsearch_bulk.go:309        Drop docs: failed to index: struct { Type string "json:\"type\""; Reason string "json:\"reason\"" }{Type:"document_parsing_exception", Reason:"[1:370] failed to parse field [event.module] of type [constant_keyword] in a time series document at [2024-06-20T14:41:58.265Z]"}   {"kind": "exporter", "data_type": "metrics", "name": "elasticsearch", "status": 400}

One of the metrics he was trying to ingest looks like this:

{"@timestamp":"2024-06-20T14:41:58.265704997Z","data_stream":{"dataset":"system.diskio"},"event":{"module":"elastic/opentelemetry-lib"},"host":{"name":"astencel"},"os":{"type":"linux"},"system":{"diskio":{"io":{"ops":0,"time":140},"name":"nvme0n1p2","read":{"bytes":4600832,"count":146,"time":16},"write":{"bytes":462848,"count":70,"time":95}}}}

The reason this fails is that the system integration adds a constant_keyword mapping for the event.module field with system as a static value:

https://github.com/elastic/integrations/blob/119664bbf8fb93d8279374f34d1faf7e960777e2/packages/system/data_stream/cpu/fields/base-fields.yml#L9-L12

We'll need to find another field to taint the translated metrics, which isn't mapped as constant_keyword. From my perspective, we don't need to use a pre-existing ECS field for that (that could reduce the risk of similar clashes in the future), and a simple boolean may be enough.

cc @ishleenk17 @lahsivjar

ishleenk17 commented 1 week ago

@felixbarny : That sounds about right. I will add a boolean flag otel_remapped taint to the remapped metrics and remove event.module