elastic / apm-server

https://www.elastic.co/guide/en/apm/guide/current/index.html
Other
1.21k stars 518 forks source link

APM Processing OTEL Collector Attributes to Labels Rather than Attributes #8074

Closed knechtionscoding closed 2 years ago

knechtionscoding commented 2 years ago

APM Server version (apm-server version): 7.17.2

Description of the problem including expected versus actual behavior:

We use open telemetry to collect traces from istio. As part of that, and in lieu of elastic/apm-server#4468 we'd like to be able upsert the deployment.environment variable in the open telemetry collector. We've attempted both service.environment and deployment.environment. Per the APM Documentation it should be deployment.environment. We've configured an attribute processor as such:

  # Mutate the Data
  processors:
    # If set to null, will be overridden with values based on k8s resource limits
    memory_limiter: null
    attributes:
      actions:
        - key: service.environment
          action: upsert
          value: "dev"
        - key: deployment.environment
          action: upsert
          value: "dev"

However, when the APM server receives it it parses it into labels.service_environment and labels.deployment_environment respectively. This is true even when using a non-string value, which according to otel's maintainers is not allowed

I do see it being set properly on the Open Telemetry collector:

Span #63
    Trace ID       : 32a7bd42c119e7883eac24a9c4527561
    Parent ID      : 461aabc5ff251158
    ID             : d8ffa6453eb48b15
    Name           : foo.bar.svc.cluster.local:80/*
    Kind           : SPAN_KIND_UNSPECIFIED
    Start time     : 2022-05-09 22:00:31.286890085 +0000 UTC
    End time       : 2022-05-09 22:00:31.296932743 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message :
Attributes:
     -> http.url: STRING(http://foo.bar.svc.cluster.local/v1)
     -> istio.canonical_service: STRING(sonic-demo-app-publisher)
     -> zone: STRING(us-east4-b)
     -> downstream_cluster: STRING(-)
     -> istio.mesh_id: STRING(cluster.local)
     -> http.protocol: STRING(HTTP/1.1)
     -> OperationName: STRING(Egress)
     -> response_size: STRING(71)
     -> node_id: STRING(sidecar~10.17.134.74~foo.bar.svc.cluster.local)
     -> guid:x-request-id: STRING(c2df7068-2939-44f8-adb8-4182b296486a)
     -> request_size: STRING(1460)
     -> http.status_code: STRING(200)
     -> istio.canonical_revision: STRING(0.0.1)
     -> peer.address: STRING(10.17.134.74)
     -> response_flags: STRING(-)
     -> http.method: STRING(POST)
     -> user_agent: STRING(Apache-HttpClient/4.5.12 (Java/11.0.13))
     -> istio.namespace: STRING(bar)
     -> upstream_cluster.name: STRING(outbound|80||foo.bar.svc.cluster.local)
     -> upstream_cluster: STRING(outbound|80||foo.bar.svc.cluster.local)
     -> component: STRING(proxy)
     -> opencensus.same_process_as_parent_span: BOOL(false)
     -> service.environment: STRING(dev)
     -> deployment.environment: STRING(dev)
ResourceSpans #1
Resource SchemaURL:
Resource labels:
     -> service.name: STRING(foo.bar
     -> opencensus.starttime: STRING(2022-05-09T14:18:30.616783293Z)
     -> host.name: STRING(foo.bar)
     -> process.pid: INT(32)
     -> telemetry.sdk.version: STRING(0.5.0-dev)
     -> opencensus.exporterversion: STRING(0.5.0-dev)
     -> telemetry.sdk.language: STRING(cpp)

APM should be placing an upserted attribute from the collector properly as though it was coming from the agent.

Screenshot of json document in APM after upsert

Screen Shot 2022-05-09 at 6 13 37 PM

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including server configuration, agent(s) used, etc. The easier you make it for us to reproduce it, the more likely that somebody will take the time to look at it.

  1. Default APM configuration of 7.17.2
  2. Default Otel Collector
  3. Add processor that upserts.

Provide logs (if relevant):

marclop commented 2 years ago

@KnechtionsCoding I've been unable to reproduce this using the OTel Go client and sendotlp:

# Checked out the 7.17 branch first, ran "docker-compose up -d; make; ./apm-server -E output.elasticsearch.username=admin -E output.elasticsearch.password=changeme -E apm-server.expvar.enabled=true -e"
$ cd systemtest/cmd/sendotlp
$ export OTEL_RESOURCE_ATTRIBUTES=service.name=test,deployment.environment=dev,service.environment=dev
$ export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:8200
$ go run main.go
2022-05-11T13:34:47+08:00   INFO    sendotlp/main.go:123    sending OTLP data to http://localhost:8200
2022-05-11T13:34:47+08:00   INFO    sendotlp/main.go:302    exporting metric    {"metric": {"name": "float64_counter", "kind": "CounterInstrumentKind", "start_time": "2022-05-11T13:34:47+08:00", "end_time": "2022-05-11T13:34:47+08:00", "sum": 1}}
2022-05-11T13:34:47+08:00   INFO    sendotlp/main.go:302    exporting metric    {"metric": {"name": "int64_histogram", "kind": "HistogramInstrumentKind", "start_time": "2022-05-11T13:34:47+08:00", "end_time": "2022-05-11T13:34:47+08:00", "boundaries": [1, 10, 100, 1000, 10000], "counts": [0, 1, 2, 3, 0, 0], "sum": 321, "count": 6}}
2022-05-11T13:34:47+08:00   INFO    sendotlp/main.go:259    exporting span  {"span": {"Name":"child1","SpanContext":{"TraceID":"6b31db508906533a573e66e05066ba7a","SpanID":"5ac163346c57b47a","TraceFlags":"01","TraceState":"","Remote":false},"Parent":{"TraceID":"6b31db508906533a573e66e05066ba7a","SpanID":"d0dd4ad39e7fc9c9","TraceFlags":"01","TraceState":"","Remote":false},"SpanKind":1,"StartTime":"2022-05-11T13:34:47.82593+08:00","EndTime":"2022-05-11T13:34:47.836946625+08:00","Attributes":null,"Events":[{"Name":"an arbitrary event","Attributes":null,"DroppedAttributeCount":0,"Time":"2022-05-11T13:34:47.836944+08:00"}],"Links":null,"Status":{"Code":0,"Description":""},"DroppedAttributes":0,"DroppedEvents":0,"DroppedLinks":0,"ChildSpanCount":0,"Resource":[{"Key":"deployment.environment","Value":{"Type":"STRING","Value":"dev"}},{"Key":"service.environment","Value":{"Type":"STRING","Value":"dev"}},{"Key":"service.name","Value":{"Type":"STRING","Value":"test"}},{"Key":"telemetry.sdk.language","Value":{"Type":"STRING","Value":"go"}},{"Key":"telemetry.sdk.name","Value":{"Type":"STRING","Value":"opentelemetry"}},{"Key":"telemetry.sdk.version","Value":{"Type":"STRING","Value":"1.5.0"}}],"InstrumentationLibrary":{"Name":"sendotlp","Version":"","SchemaURL":""}}}
2022-05-11T13:34:47+08:00   INFO    sendotlp/main.go:259    exporting span  {"span": {"Name":"child2","SpanContext":{"TraceID":"6b31db508906533a573e66e05066ba7a","SpanID":"5580e212d9a17e76","TraceFlags":"01","TraceState":"","Remote":false},"Parent":{"TraceID":"6b31db508906533a573e66e05066ba7a","SpanID":"d0dd4ad39e7fc9c9","TraceFlags":"01","TraceState":"","Remote":false},"SpanKind":1,"StartTime":"2022-05-11T13:34:47.847563+08:00","EndTime":"2022-05-11T13:34:47.858493166+08:00","Attributes":null,"Events":[{"Name":"exception","Attributes":[{"Key":"exception.type","Value":{"Type":"STRING","Value":"*errors.errorString"}},{"Key":"exception.message","Value":{"Type":"STRING","Value":"an exception occurred"}}],"DroppedAttributeCount":0,"Time":"2022-05-11T13:34:47.858492+08:00"}],"Links":null,"Status":{"Code":0,"Description":""},"DroppedAttributes":0,"DroppedEvents":0,"DroppedLinks":0,"ChildSpanCount":0,"Resource":[{"Key":"deployment.environment","Value":{"Type":"STRING","Value":"dev"}},{"Key":"service.environment","Value":{"Type":"STRING","Value":"dev"}},{"Key":"service.name","Value":{"Type":"STRING","Value":"test"}},{"Key":"telemetry.sdk.language","Value":{"Type":"STRING","Value":"go"}},{"Key":"telemetry.sdk.name","Value":{"Type":"STRING","Value":"opentelemetry"}},{"Key":"telemetry.sdk.version","Value":{"Type":"STRING","Value":"1.5.0"}}],"InstrumentationLibrary":{"Name":"sendotlp","Version":"","SchemaURL":""}}}
2022-05-11T13:34:47+08:00   INFO    sendotlp/main.go:259    exporting span  {"span": {"Name":"parent","SpanContext":{"TraceID":"6b31db508906533a573e66e05066ba7a","SpanID":"d0dd4ad39e7fc9c9","TraceFlags":"01","TraceState":"","Remote":false},"Parent":{"TraceID":"00000000000000000000000000000000","SpanID":"0000000000000000","TraceFlags":"00","TraceState":"","Remote":false},"SpanKind":1,"StartTime":"2022-05-11T13:34:47.825919+08:00","EndTime":"2022-05-11T13:34:47.859388667+08:00","Attributes":null,"Events":null,"Links":null,"Status":{"Code":0,"Description":""},"DroppedAttributes":0,"DroppedEvents":0,"DroppedLinks":0,"ChildSpanCount":2,"Resource":[{"Key":"deployment.environment","Value":{"Type":"STRING","Value":"dev"}},{"Key":"service.environment","Value":{"Type":"STRING","Value":"dev"}},{"Key":"service.name","Value":{"Type":"STRING","Value":"test"}},{"Key":"telemetry.sdk.language","Value":{"Type":"STRING","Value":"go"}},{"Key":"telemetry.sdk.name","Value":{"Type":"STRING","Value":"opentelemetry"}},{"Key":"telemetry.sdk.version","Value":{"Type":"STRING","Value":"1.5.0"}}],"InstrumentationLibrary":{"Name":"sendotlp","Version":"","SchemaURL":""}}}

It may be specific to the OTEL Collector Attributes, I haven't tested that yet, but will test that afterwards.

image image
marclop commented 2 years ago

I was able to reproduce it using the Open Telemetry Collector with the following config:

collector.yml

receivers:
  otlp:
    protocols:
      grpc:

exporters:
  otlp:
    endpoint: http://localhost:8200
    tls:
      insecure: true

  logging:
    loglevel: debug

  file:
    path: ./8074.json

# Mutate the Data
processors:
  attributes:
    actions:
      - key: service.environment
        action: upsert
        value: dev
      - key: deployment.environment
        action: upsert
        value: dev
  batch:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [attributes, batch]
      exporters: [otlp, logging, file]
    metrics:
      receivers: [otlp]
      processors: [attributes, batch]
      exporters: [otlp, logging, file]
$ curl -sLO https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.50.0/otelcol_0.50.0_darwin_arm64.tar.gz
$ tar -xvzf otelcol_0.50.0_darwin_arm64.tar.gz
$ ./otelcol --config collector.ym

8074.json

{"resourceMetrics":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"unknown_service:main"}},{"key":"telemetry.sdk.language","value":{"stringValue":"go"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.5.0"}}]},"scopeMetrics":[{"scope":{"name":"sendotlp"},"metrics":[{"name":"float64_counter","sum":{"dataPoints":[{"attributes":[{"key":"service.environment","value":{"stringValue":"dev"}},{"key":"deployment.environment","value":{"stringValue":"dev"}}],"startTimeUnixNano":"1652255991901355000","timeUnixNano":"1652255991901379000","asDouble":1}],"aggregationTemporality":"AGGREGATION_TEMPORALITY_CUMULATIVE","isMonotonic":true}},{"name":"int64_histogram","histogram":{"dataPoints":[{"attributes":[{"key":"service.environment","value":{"stringValue":"dev"}},{"key":"deployment.environment","value":{"stringValue":"dev"}}],"startTimeUnixNano":"1652255991901355000","timeUnixNano":"1652255991901379000","count":"6","sum":321,"bucketCounts":["0","1","2","3","0","0"],"explicitBounds":[1,10,100,1000,10000]}],"aggregationTemporality":"AGGREGATION_TEMPORALITY_CUMULATIVE"}}]}],"schemaUrl":"https://opentelemetry.io/schemas/v1.7.0"}]}
{"resourceSpans":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"unknown_service:main"}},{"key":"telemetry.sdk.language","value":{"stringValue":"go"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.5.0"}}]},"scopeSpans":[{"scope":{"name":"sendotlp"},"spans":[{"traceId":"4b1ed5fc81f12713e977692d17efe4e3","spanId":"52ee3a06dc3b4dc5","parentSpanId":"ea781daeafc5aa9b","name":"child1","kind":"SPAN_KIND_INTERNAL","startTimeUnixNano":"1652255991908504000","endTimeUnixNano":"1652255991919529084","attributes":[{"key":"service.environment","value":{"stringValue":"dev"}},{"key":"deployment.environment","value":{"stringValue":"dev"}}],"events":[{"timeUnixNano":"1652255991919527000","name":"an arbitrary event"}],"status":{}}]}],"schemaUrl":"https://opentelemetry.io/schemas/v1.7.0"},{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"unknown_service:main"}},{"key":"telemetry.sdk.language","value":{"stringValue":"go"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.5.0"}}]},"scopeSpans":[{"scope":{"name":"sendotlp"},"spans":[{"traceId":"4b1ed5fc81f12713e977692d17efe4e3","spanId":"919bbbc4390594ae","parentSpanId":"ea781daeafc5aa9b","name":"child2","kind":"SPAN_KIND_INTERNAL","startTimeUnixNano":"1652255991921207000","endTimeUnixNano":"1652255991931282750","attributes":[{"key":"service.environment","value":{"stringValue":"dev"}},{"key":"deployment.environment","value":{"stringValue":"dev"}}],"events":[{"timeUnixNano":"1652255991931281000","name":"exception","attributes":[{"key":"exception.type","value":{"stringValue":"*errors.errorString"}},{"key":"exception.message","value":{"stringValue":"an exception occurred"}}]}],"status":{}}]}],"schemaUrl":"https://opentelemetry.io/schemas/v1.7.0"},{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"unknown_service:main"}},{"key":"telemetry.sdk.language","value":{"stringValue":"go"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.5.0"}}]},"scopeSpans":[{"scope":{"name":"sendotlp"},"spans":[{"traceId":"4b1ed5fc81f12713e977692d17efe4e3","spanId":"ea781daeafc5aa9b","parentSpanId":"","name":"parent","kind":"SPAN_KIND_INTERNAL","startTimeUnixNano":"1652255991908496000","endTimeUnixNano":"1652255991931783708","attributes":[{"key":"service.environment","value":{"stringValue":"dev"}},{"key":"deployment.environment","value":{"stringValue":"dev"}}],"status":{}}]}],"schemaUrl":"https://opentelemetry.io/schemas/v1.7.0"}]}

The problem seems to be that we only decode the known attributes from the metadata, but if they're present in the span, we're not adding them as "known" fields: https://github.com/elastic/apm-server/blob/b782e9ec62c041f4ea27a670057de698bb7280d6/processor/otel/traces.go#L153-L163.

@axw is this the expected behavior? or should we take the well defined span attributes and map them accordingly as well?

knechtionscoding commented 2 years ago

@marclop does that mean if we use something like: https://istio.io/latest/docs/tasks/observability/distributed-tracing/mesh-and-proxy-config/ to set a custom_tag that maps to deployment.environment apm would properly map it?

arnestorksen commented 2 years ago

@KnechtionsCoding I did the same thing as you, but I discovered that you may set resource-level attributes using the resource processor.

Using this processor configuration, I got my environment property to show up in Elastic APM.

    processors:
      resource:
        attributes:
          - key: deployment.environment
            action: insert
            value: test

https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourceprocessor

knechtionscoding commented 2 years ago

@arnestorksen That does work, and I appreciate the help! Tackar! I'd still be curious to know why the resource.attributes processor is different than the attributes processor.

marclop commented 2 years ago

@KnechtionsCoding I don't know what the resulting OTLP trace format will look like when using the Istio custom_tags. If they're set at the resource level, then they should be parsed, if not, then they will be treated as labels.

@arnestorksen Thanks for providing the example processor, I did test it locally and the resource attributes were mapped correctly.

axw commented 2 years ago

@axw is this the expected behavior? or should we take the well defined span attributes and map them accordingly as well?

It is expected. These are attributes that describe the thing (service) that produced the events (traces, metrics, ...); they're not describing the events themselves. See also https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#resources

We should probably document this.

marclop commented 2 years ago

should we repurpose this issue as a doc issue and document the expected behavior?

axw commented 2 years ago

@marclop I'd rather preserve the description in this issue, so I've created https://github.com/elastic/apm-server/issues/8173 to follow up on docs. As things are working as expected here, I'll close this issue.

nazarmubeen commented 1 year ago

Hi @marclop , i need to send sample custom test metric data to APM server directly. what is the better approach to do it ?

after starting collector i get below :-

image

i am sending data from postman to this host

image

however still not able to visualize the data in elastic.