aws-observability / aws-otel-java-instrumentation

AWS Distro for OpenTelemetry Java Instrumentation Library
https://aws-otel.github.io/
Apache License 2.0
75 stars 56 forks source link

Duplicate Log Entries in CloudWatch Agent, using `public.ecr.aws/aws-observability/adot-autoinstrumentation-java:v1.32.5` #947

Open yumemi-mabe opened 2 weeks ago

yumemi-mabe commented 2 weeks ago

Describe the bug

Using public.ecr.aws/aws-observability/adot-autoinstrumentation-java:v1.32.5, the CloudWatch Agent is sending duplicate log entries with the same content, like this:

image
2024-11-07T04:21:35Z I! {
    "caller": "awsemfexporter@v0.103.0/emf_exporter.go:119",
    "msg": "Start processing resource metrics",
    "kind": "exporter",
    "data_type": "metrics",
    "name": "awsemf/application_signals",
    "labels": {
        "aws.ecs.cluster.arn": "arn:aws:ecs:ap-northeast-1:xxx:cluster/ecscluster-my-app",
        "aws.ecs.container.arn": "arn:aws:ecs:ap-northeast-1:xxx:container/ecscluster-my-app/xxx/xxx",
        "aws.ecs.container.image.id": "sha256:xxx",
        "aws.ecs.launchtype": "fargate",
        "aws.ecs.task.arn": "arn:aws:ecs:ap-northeast-1:xxx:task/ecscluster-my-app/xxx",
        "aws.ecs.task.family": "ecstask-my-app",
        "aws.ecs.task.revision": "3",
        "aws.local.service": "ecstask-my-app",
        "aws.log.group.arns": "",
        "aws.log.group.names": "/aws/ecs/task/ecstask-my-app",
        "aws.log.stream.arns": "",
        "aws.log.stream.names": "",
        "cloud.account.id": "xxx",
        "cloud.availability_zone": "ap-northeast-1a",
        "cloud.platform": "aws_ecs",
        "cloud.provider": "aws",
        "cloud.region": "ap-northeast-1",
        "container.id": "xxx",
        "container.image.name": "xxx.dkr.ecr.ap-northeast-1.amazonaws.com/xxx",
        "container.image.tag": "xxx",
        "container.name": "ecstask-my-app",
        "host.arch": "aarch64",
        "host.name": "ip-xxx.ap-northeast-1.compute.internal",
        "os.description": "Linux 5.10.226-214.880.amzn2.aarch64",
        "os.type": "linux",
        "process.command_args": "",
        "process.executable.path": "/usr/lib/jvm/java-17-amazon-corretto/bin/java",
        "process.pid": "",
        "process.runtime.description": "Amazon.com Inc. OpenJDK 64-Bit Server VM 17.0.9+8-LTS",
        "process.runtime.name": "OpenJDK Runtime Environment",
        "process.runtime.version": "17.0.9+8-LTS",
        "service.name": "ecstask-my-app",
        "telemetry.auto.version": "1.32.5-aws",
        "telemetry.sdk.language": "java",
        "telemetry.sdk.name": "opentelemetry",
        "telemetry.sdk.version": "1.34.1"
    }
}

Note: I masked Sensitive information.

This issue occurs in v1.32.5 but did not occur in v1.32.4.

Steps to reproduce

According to the Application Signals - Deploy using the Sidecar Strategy documentation, I configured my task definition as follows:

containerDefinitions in taskdef.json ``` "containerDefinitions": [ { "name": "my-app", "logConfiguration": { "logDriver": "awslogs", "secretOptions": null, "options": { "awslogs-group": "${AWS_LOGS_GROUP}", "awslogs-region": "${AWS_REGION}", "awslogs-stream-prefix": "my-app", "awslogs-datetime-format": "%Y-%m-%d %H:%M:%S" } }, "entryPoint": [], "portMappings": [ { "hostPort": 8080, "protocol": "tcp", "containerPort": 8080 } ], "environment": [ { "name": "OTEL_RESOURCE_ATTRIBUTES", "value": "aws.log.group.names=${AWS_LOGS_GROUP},aws.hostedin.environment=$HOST_ENV,service.name=my-app" }, { "name": "OTEL_LOGS_EXPORTER", "value": "none" }, { "name": "OTEL_METRICS_EXPORTER", "value": "none" }, { "name": "OTEL_EXPORTER_OTLP_PROTOCOL", "value": "http/protobuf" }, { "name": "OTEL_AWS_APPLICATION_SIGNALS_ENABLED", "value": "true" }, { "name": "JAVA_TOOL_OPTIONS", "value": " -javaagent:/otel-auto-instrumentation/javaagent.jar" }, { "name": "OTEL_AWS_APPLICATION_SIGNALS_EXPORTER_ENDPOINT", "value": "http://localhost:4316/v1/metrics" }, { "name": "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT", "value": "http://localhost:4316/v1/traces" }, { "name": "OTEL_TRACES_SAMPLER", "value": "xray" }, { "name": "OTEL_PROPAGATORS", "value": "tracecontext,baggage,b3,xray" } ], "mountPoints": [ { "sourceVolume": "opentelemetry-auto-instrumentation", "containerPath": "/otel-auto-instrumentation", "readOnly": false } ], "image": "", "essential": true, "dependsOn": [ { "containerName": "ecs-cwagent", "condition": "START" }, { "containerName": "init", "condition": "START" } ] }, { "name": "ecs-cwagent", "image": "public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest", "essential": true, "secrets": [ { "name": "CW_CONFIG_CONTENT", "valueFrom": "ecs-cwagent" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "${AWS_LOGS_GROUP}", "awslogs-region": "${AWS_REGION}", "awslogs-stream-prefix": "ecs-cwagent", "awslogs-datetime-format": "%Y-%m-%dT%H:%M:%SZ" } } }, { "name": "init", "image": "public.ecr.aws/aws-observability/adot-autoinstrumentation-java:v1.32.5", "essential": false, "command": [ "cp", "/javaagent.jar", "/otel-auto-instrumentation/javaagent.jar" ], "mountPoints": [ { "sourceVolume": "opentelemetry-auto-instrumentation", "containerPath": "/otel-auto-instrumentation", "readOnly": false } ] } ] ```

What did you expect to see?

The CloudWatch Agent doesn't send duplicate log entries.

What did you see instead?

CloudWatch Agent is sending duplicate log entries with the same content.

Additional context

pingleig commented 2 weeks ago

I think this issue is from CloudWatch Agent, the log line is here

https://github.com/amazon-contributing/opentelemetry-collector-contrib/blob/aws-cwa-dev/exporter/awsemfexporter/emf_exporter.go#L119

func (emf *emfExporter) pushMetricsData(_ context.Context, md pmetric.Metrics) error {
    rms := md.ResourceMetrics()
    labels := map[string]string{}
    for i := 0; i < rms.Len(); i++ {
        rm := rms.At(i)
        am := rm.Resource().Attributes()
        if am.Len() > 0 {
            am.Range(func(k string, v pcommon.Value) bool {
                labels[k] = v.Str()
                return true
            })
        }
    }
    emf.config.logger.Info("Start processing resource metrics", zap.Any("labels", labels))
    emf.processResourceLabels(labels)

I am not sure if this need to be info level logging.

TBH I am not sure which repo is the right place to create issue (cwagent, otel contrib), you can check with @bjrara

bjrara commented 1 week ago

I agree with this comment, the log should not be created at info level.

In v1.32.5, we introduced runtime metrics (which will be released on 11/20) that emit JVM metrics every minute. I believe the so called "duplicated" logs come from different batches of metrics, one from runtime, one from RED (latency, error, fault) metrics generated from incoming/outgoing calls.

It shouldn't cause any impacting problems but I do agree such logs are misleading and noisy. We should downgrade the log level to avoid confusion.

bjrara commented 6 days ago

Actually, the log level has been downgraded in OTEL collector: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/29337.

Seems to be a missing patch at CloudWatch agent side.

bjrara commented 6 days ago

https://github.com/amazon-contributing/opentelemetry-collector-contrib/pull/259