aws-observability / aws-otel-collector

AWS Distro for OpenTelemetry Collector (see ADOT Roadmap at https://github.com/orgs/aws-observability/projects/4)
https://aws-otel.github.io/
Other
585 stars 239 forks source link

Continuous Messages of exporting failed, dropped data, sender failed in the aws-otel-collector.log #551

Closed georges-git closed 2 years ago

georges-git commented 3 years ago

Hello @mxiamxia and AWS group - Keep getting following messages in the OTEL collector log. How to fix it?

{2021-06-23 09:52:04.43621453 -0400 EDT m=+120.147512184, Level:error, Caller:go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173, Message:Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(queuedRetrySender).send go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173 go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func2 go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/metrics.go:103 go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeMetricsFunc.ConsumeMetrics go.opentelemetry.io/collector@v0.27.0/consumer/consumerhelper/metrics.go:29 go.opentelemetry.io/collector/service/internal/fanoutconsumer.metricsConsumer.ConsumeMetrics go.opentelemetry.io/collector@v0.27.0/service/internal/fanoutconsumer/consumer.go:51 go.opentelemetry.io/collector/processor/batchprocessor.(batchMetrics).export go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:285 go.opentelemetry.io/collector/processor/batchprocessor.(batchProcessor).sendItems go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:183 go.opentelemetry.io/collector/processor/batchprocessor.(batchProcessor).startProcessingCycle go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:144} {2021-06-23 09:52:04.436239353 -0400 EDT m=+120.147537026, Level:warn, Caller:go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:184, Message:Sender failed, Stack:}

mxiamxia commented 3 years ago

could you attach more collector logs?

georges-git commented 3 years ago

Hello @mxiamxia - This is all the information in the collector logs. Let me know how to fix this.

go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeMetricsFunc.ConsumeMetrics go.opentelemetry.io/collector@v0.27.0/consumer/consumerhelper/metrics.go:29 go.opentelemetry.io/collector/processor/batchprocessor.(batchMetrics).export go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:285 go.opentelemetry.io/collector/processor/batchprocessor.(batchProcessor).sendItems go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:183 go.opentelemetry.io/collector/processor/batchprocessor.(batchProcessor).startProcessingCycle go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:144} {2021-06-25 17:47:42.118233801 -0400 EDT m=+18856.163846182, Level:error, Caller:go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173, Message:Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(queuedRetrySender).send go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173 go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func2 go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/metrics.go:103 go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeMetricsFunc.ConsumeMetrics go.opentelemetry.io/collector@v0.27.0/consumer/consumerhelper/metrics.go:29 go.opentelemetry.io/collector/processor/batchprocessor.(batchMetrics).export go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:285 go.opentelemetry.io/collector/processor/batchprocessor.(batchProcessor).sendItems go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:183 go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:144} {2021-06-25 17:47:42.118273569 -0400 EDT m=+18856.163885957, Level:warn, Caller:go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:184, Message:Sender failed, Stack:}

sethAmazon commented 3 years ago

Can you turn on debug logs please. echo "loggingLevel=DEBUG" | sudo tee -a /opt/aws/aws-otel-collector/etc/extracfg.txt then restart the collector if running on ec2

georges-git commented 3 years ago

Attaching again the zipped log file with DEBUG logging level. Could you escalate and help resolve this as it is being going on for last few weeks. aws-otel-collector.zip

sethAmazon commented 3 years ago

{2021-06-28 12:12:53.950912958 -0400 EDT m=+60.073175088, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/cwlog_client.go:158, Message:cwlog_client: creating stream fail, Stack:} {2021-06-28 12:12:53.950976095 -0400 EDT m=+60.073238214, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/cwlog_client.go:176, Message:CreateLogStream / CreateLogGroup has errors., Stack:} {2021-06-28 12:12:53.950990885 -0400 EDT m=+60.073252994, Level:warn, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/pusher.go:280, Message:Failed to create stream token, Stack:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter.(*pusher).pushLogEventBatch github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/pusher.go:280

I saw a similar error when I ran on ec2 but not when run inn a docker container. I think there might be a problem with the way we are getting creds. Can you try running this in a docker container and passing in the access key manually to the container with "-e AWS_ACCESS_KEY_ID={your access key here} -e AWS_SECRET_ACCESS_KEY={secret key here}" guide on how to run with docker is https://github.com/aws-observability/aws-otel-collector/blob/main/docs/developers/docker-demo.md

georges-git commented 3 years ago

Hello @sethAmazon - Can you expand on what your are saying above? My app is running in a docker. Are you asking to run the OTEL daemon process in a docker container?

sethAmazon commented 3 years ago

How are you passing in the credentials for otel?

georges-git commented 3 years ago

I am not passing any credentials. I just run on my linux host - sudo /opt/aws/aws-otel-collector/bin/aws-otel-collector-ctl -c /opt/aws/aws-otel-collector/etc/config.yaml -a start.

This "aws-otel-collector" is not running in a docker. Each of my EC2 linux host has a aws role attached to it.

How would you like me to pass the credentials to this "aws-otel-collector" process running on the linux hosts.

sahilsapolia commented 3 years ago

I am facing the similar problem while exporting spans to aws xray. Metrics works fine for me.

{2021-08-27 00:47:35.104061377 +0000 GMT m=+761.062074882, Level:error, Caller:go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:245, Message:Exporting failed. Try enabling retry_on_failure config option., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(retrySender).send go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:245 go.opentelemetry.io/collector/exporter/exporterhelper.(tracesExporterWithObservability).send go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/traces.go:118 go.opentelemetry.io/collector/exporter/exporterhelper.(queuedRetrySender).send go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:173 go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2 go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/traces.go:97 go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/consumer/consumerhelper/traces.go:29 go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(Receiver).Export go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/receiver/otlpreceiver/internal/trace/otlp.go:62 go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/otlpgrpc/traces.go:85 go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/internal/data/protogen/collector/trace/v1/trace_service.pb.go:210 google.golang.org/grpc.(Server).processUnaryRPC google.golang.org/grpc@v1.38.0/server.go:1286 google.golang.org/grpc.(Server).handleStream google.golang.org/grpc@v1.38.0/server.go:1609 google.golang.org/grpc.(Server).serveStreams.func1.2 google.golang.org/grpc@v1.38.0/server.go:934}^M {2021-08-27 00:47:35.104247064 +0000 GMT m=+761.062260500, Level:error, Caller:go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:175, Message:Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(queuedRetrySender).send go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:175 go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2 go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/traces.go:97 go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/consumer/consumerhelper/traces.go:29 go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(Receiver).Export go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/receiver/otlpreceiver/internal/trace/otlp.go:62 go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/otlpgrpc/traces.go:85 go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/internal/data/protogen/collector/trace/v1/trace_service.pb.go:210 google.golang.org/grpc.(Server).processUnaryRPC google.golang.org/grpc@v1.38.0/server.go:1286 google.golang.org/grpc.(*Server).handleStream

sahilsapolia commented 3 years ago

I ran a debug and found that exporter is sending the request with the trace segment but getting the response error with empty line. Shortening the trace string to make it more readable.

{​​​​​​​2021-08-27 19:31:33.259092001 +0000 GMT m=+68199.217105694, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:54, Message:TracesExporter, Stack:}​​​​​​​^M {​​​​​​​2021-08-27 19:31:33.259609948 +0000 GMT m=+68199.217624182, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:78, Message:request: {​​​​​​​ TraceSegmentDocuments: ["{​​​​​​​\"name\":\"test- ....... ...... .......}​​​​​​​, Stack:}​​​​​​​^M {​​​​​​​2021-08-27 19:31:33.380405084 +0000 GMT m=+68199.338418868, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:81, Message:response error, Stack:}​​​​​​​^M {​​​​​​​2021-08-27 19:31:33.38056959 +0000 GMT m=+68199.338583498, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:85, **Message:response: {​​​​​​​

}​​​​​​​, Stack:}​​​​​​​**^M

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

sethAmazon commented 2 years ago

Hi @georges-git can we do a live debug session sometime on this?

aureliomarcoag commented 2 years ago

I had this problem on a Python Lambda function. After tweaking the code, I could reproduce the problem by simply using boto3.client("s3").download_file() inside a tempfile context manager, so something like this:

s3_client = boto3.client("s3")
with tempfile.NamedTemporaryFile() as tmp_file:
    boto3.client("s3").download_file(Bucket="mybucket", Key="key/object", Filename=tmp_file.name)

I eventually stumbled upon https://github.com/aws-observability/aws-otel-lambda/issues/10 so I assume this might be related. I switched from download_file to get_object and the error went away. I also tested with upload_file and that caused the same error to happen.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been marked as stall for 30 days with no activity.