aws-observability / aws-otel-collector

AWS Distro for OpenTelemetry Collector (see ADOT Roadmap at https://github.com/orgs/aws-observability/projects/4)
https://aws-otel.github.io/
Other
572 stars 239 forks source link

Missing metric dimensions in EMF when using opentelemetry exporter-metrics-otlp-grpc #1216

Closed Rikkedi closed 2 years ago

Rikkedi commented 2 years ago

Describe the bug I'm adding instrumentation to a NestJS service (running in ECS Fargate) using the OTLPMetricExporter from '@opentelemetry/exporter-metrics-otlp-grpc' and then running awsemf exporter as a sidecar to pipe these to CloudWatch in Embedded Metrics Format. However, I can't get any of my metric attributes to show up as dimensions in emf, not in the log awsemf debug logs or in the generated metrics stream.

Steps to reproduce Sample code for the instrumentation

import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { Resource } from '@opentelemetry/resources';
import { ConsoleMetricExporter, Meter, MeterProvider, MetricExporter } from '@opentelemetry/sdk-metrics-base';

const exporter = new OTLPMetricExporter({ url: 'grpc://localhost:4317' })
const meterProvider: MeterProvider = new MeterProvider({
      resource: Resource.default().merge(
        new Resource({
          [SemanticResourceAttributes.SERVICE_NAME]: 'test-service'
        })
      ),
      exporter: exporter,
      interval: 5000
    });
const meter = meterProvider.getMeter('some-name');
const counter = meter.createCounter('http_requests_total', {
      description: `Total count of HTTP requests received`,
      component: 'http',
      valueType: 1,
      unit: 'Count'
    });
counter.add(1, {http_route: 'test', http_method: 'get', http_controller: 'testController', http_controller_method: 'getTest'});

What did you expect to see? I expect to see metrics in EMF like the following

{
    "OTelLib": "some-name",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "MyNamespace",
                "Dimensions": [
                    [
                        "OTelLib",
                        "service.name",
                        "http_route",
                        "http_method",
                        "http_controller",
                        "http_controller_method",
                        "telemetry.sdk.version"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "http_requests_total",
                        "Unit": "Count"
                    }
                ]
            }
        ],
        "Timestamp": 1652209141514
    },
    "service.name": "test-service",
    "http_request_failures": 1,
    "http_route": "test",
    "http_method": "get",
    "http_controller": "testController",
    "http_controller_method": "getTest",
    "telemetry.sdk.version": "1.0.1"
}

What did you see instead? Instead, I get the metrics that come through the Resource object, but none of my custom attributes.

{
    "OTelLib": "some-name",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "MyNamespace",
                "Dimensions": [
                    [
                        "OTelLib",
                        "service.name",
                        "telemetry.sdk.version"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "http_requests_total",
                        "Unit": "Count"
                    }
                ]
            }
        ],
        "Timestamp": 1652209141514
    },
    "service.name": "test-service",
    "telemetry.sdk.version": "1.0.1"
}

Environment The collector runs as a sidecar to an ECS Fargate task. config.yaml

extensions:
  health_check:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch/metrics:
    timeout: 60s
  memory_limiter:
    check_interval: 5s
    limit_mib: 4000
    spike_limit_mib: 500
  resource:
    attributes:
      - key: telemetry.sdk.language
        action: delete
      - key: telemetry.sdk.name
        action: delete

exporters:
  logging:
    loglevel: debug
  awsemf:
    namespace: 'MyNamespace'
    log_group_name: 'MyLogGroup'
    dimension_rollup_option: NoDimensionRollup
    resource_to_telemetry_conversion:
      enabled: true
    # I've tried with and without `attributes` in this setting.
    # parse_json_encoded_attr_values: [attributes]

service:
  telemetry:
    logs:
      level: "debug"
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch/metrics]
      exporters: [logging, awsemf]

  extensions: [health_check]

I'm mounting the config to the container by generating my own image from aws-otel-collector

#Dockerfile
FROM amazon/aws-otel-collector

COPY config.yaml /workspace/config/otel-awsemf-config.yaml

CMD ["--config=/workspace/config/otel-awsemf-config.yaml"]

Node environment:

   node: 16.13.0
    "@opentelemetry/exporter-metrics-otlp-grpc": "0.27.0",
    "@opentelemetry/sdk-metrics-base": "0.27.0",

Additional context Running the otlp-grpc exporter with debug logging, I can see the attributes being sent:

items to be sent [
  {
    descriptor: {
      name: 'http_requests_total',
      description: 'Total count of HTTP requests received',
      unit: 'Count',
      metricKind: 0,
      valueType: 1
    },
    attributes: {
      http_route: 'test',
      http_method: 'GET',
      http_controller: 'testController',
      http_controller_method: 'getTest'
    },
    aggregator: SumAggregator { kind: 0, _current: 1, _lastUpdateTime: [Array] },
    aggregationTemporality: 2,
    resource: Resource { attributes: [Object] },
    instrumentationLibrary: { name: 'some-name', version: undefined, schemaUrl: undefined }
  }
]
dnutels commented 2 years ago

Actually, it wouldn't work even with this minimal setup:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch/metrics:
    timeout: 1s
    send_batch_size: 50

exporters:
  logging/metrics:
    loglevel: debug

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch/metrics]
      exporters: [logging/metrics]

Similar environment, except:

Node.js v17.8.0

and docker-compose usage:

services:
  adot-collector:
    image: public.ecr.aws/aws-observability/aws-otel-collector:latest
    command: ['--config=config.yml']
    volumes:
      - ${PWD}/tools/collector/dev/config.yml:/config.yml
...

Code (and none of the static or dynamic labels show up):

const requestsCount = meter.createCounter('http_requests_total', {
    description: 'Total amount of HTTP requests',
    constantAttributes: STATIC_LABELS
});

// later in a request handler

const labels = {
    method: req.method,
    route: req.route.path,
    statusCode: res.statusCode,
    service: SERVICE_NAME
};

requestsCount.add(1, labels);

Output (without docker-compose prefix):

ResourceMetrics #0
Resource SchemaURL:
Resource labels:
     -> service.name: STRING(some-service)
     -> telemetry.sdk.language: STRING(nodejs)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.2.0)
InstrumentationLibraryMetrics #0
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary some-service
Metric #0
Descriptor:
     -> Name: http_requests_total
     -> Description: Total amount of HTTP requests
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
NumberDataPoints #0
StartTimestamp: 2022-05-15 01:47:59.073999872 +0000 UTC
Timestamp: 2022-05-15 01:48:11.881179904 +0000 UTC
Value: 2.000000

Switching to otel-collector (exact same config file):

services:
  otel-collector:
    image: otel/opentelemetry-collector:latest
    command: ['--config=config.yml']
    volumes:
      - ${PWD}/tools/collector/dev/config.yml:/config.yml

Note latest tag which hasn't been updated in 8 months

produces:

ResourceMetrics #0
Resource labels:
     -> service.name: STRING(some-service)
     -> telemetry.sdk.language: STRING(nodejs)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.2.0)
InstrumentationLibraryMetrics #0
InstrumentationLibrary some-service
Metric #0
Descriptor:
     -> Name: http_requests_total
     -> Description: Total amount of HTTP requests
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
NumberDataPoints #0
Data point attributes:
     -> method: STRING(POST)
     -> route: STRING(/some-route)
     -> statusCode: STRING(200)
     -> service: STRING(some-service)
StartTimestamp: 2022-05-15 01:53:50.724 +0000 UTC
Timestamp: 2022-05-15 01:55:25.685500416 +0000 UTC
Value: 3.000000
Rikkedi commented 2 years ago

I just tried again with the recently released v0.18.0 otel-collector image and still having this issue.

bryan-aguilar commented 2 years ago

Hi @Rikkedi ,

Thank you for providing us with some steps to replicate. We will take a look at this and see if we can identify the root cause of the issue. When we have more information we will reach out on this issue.

Rikkedi commented 2 years ago

Hi @bryan-aguilar, any news on the root cause here? Thanks in advance for taking a look.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

bcelenza commented 2 years ago

I'm able to reproduce this with the configuration mentioned.

@bryan-aguilar Is there an update you can provide?

bryan-aguilar commented 2 years ago

Hi,

We do recognize this as an issue but afaik no one is actively working on this. We provide GitHub support on a best effort basis.

I did notice that the latest comment from @dnutels reproduced this with a pretty barebones ADOT Collector Config. Can we confirm if this also fails with the barebone collector config while using the latest upstream collector? Note that the latest tag is not updated upstream so you will have to target otel/opentelemetry-collector:0.58.0. If this still fails with the config and the latest version of the upstream collector I would advise to open an issue upstream. If this passes with the latest upstream version but fails with the latest ADOT Collector then we can reevaluate the priority. Also please ensure that you are using the latest JS SDK version.

  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch/metrics:
    timeout: 1s
    send_batch_size: 50

exporters:
  logging/metrics:
    loglevel: debug

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch/metrics]
      exporters: [logging/metrics]
Rikkedi commented 2 years ago

@bryan-aguilar I've had much better luck with newer versions of both the collector side-car and the OTel libraries. Labels are working with aws-otel-collector v0.20.0 and the following OTel versions:

"@opentelemetry/api": "1.1.0",
"@opentelemetry/exporter-metrics-otlp-grpc": "0.32.0",
"@opentelemetry/sdk-metrics-base": "0.31.0",