aws-observability / aws-otel-collector

AWS Distro for OpenTelemetry Collector (see ADOT Roadmap at https://github.com/orgs/aws-observability/projects/4)
https://aws-otel.github.io/
Other
586 stars 240 forks source link

"NoCredentialProviders: no valid providers in chain" error when no IAM Role is attached to the instance #1286

Closed rsmaso-aws closed 1 year ago

rsmaso-aws commented 2 years ago

Describe the bug When running the AWS OpenTelemetry collector v1.17.1+ in an on-premises setup or on an EC2 instance that has no IAM role attached, AOC is not able to recognize AWS credentials, no matter how they are provided (via env, file) and no matter what exporter for an AWS service (such as X-Ray, CW, AMP) is used fails with an NoCredentialProviders: no valid providers in chain error message in the logs.

NOTE: The issue does NOT appear when there is an IAM role attached to the EC2 instance or ECS task!

Steps to reproduce

  1. Launch EC2 instance with Ubuntu (t2.micro)
  2. Setup your AWS credentials in /home/ubuntu/.aws/credentials or export them using ENV vars.
  3. Install collector, for example, using https://aws-otel-collector.s3.amazonaws.com/ubuntu/amd64/v0.18.0/aws-otel-collector.deb
  4. Run the collector with: sudo AWS_REGION=us-east-1 AWS_CONFIG_FILE="/home/ubuntu/.aws/credentials" /opt/aws/aws-otel-collector/bin/aws-otel-collector-ctl -c ./config.yaml -a start Alternatively: sudo AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID =*** AWS_SECRET_ACCESS_KEY=*** /opt/aws/aws-otel-collector/bin/aws-otel-collector-ctl -c ./config.yaml -a start

What did you expect to see? I expect the collector to properly authenticate and communicate with AWS X-Ray using provided credentials!

What did you see instead?

...
{"level":"error","timestamp":"2022-06-09T09:46:22.088Z","caller":"exporterhelper/queued_retry.go:85","message":"Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.","kind":"exporter","name":"awsxray","dropped_items":1,"stack":"go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send\n\tgo.opentelemetry.io/collector@v0.51.0/exporter/exporterhelper/queued_retry.go:85\ngo.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2\n\tgo.opentelemetry.io/collector@v0.51.0/exporter/exporterhelper/traces.go:113\ngo.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces\n\tgo.opentelemetry.io/collector@v0.51.0/consumer/traces.go:36\ngo.opentelemetry.io/collector/processor/batchprocessor.(*batchTraces).export\n\tgo.opentelemetry.io/collector@v0.51.0/processor/batchprocessor/batch_processor.go:257\ngo.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems\n\tgo.opentelemetry.io/collector@v0.51.0/processor/batchprocessor/batch_processor.go:185\ngo.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle\n\tgo.opentelemetry.io/collector@v0.51.0/processor/batchprocessor/batch_processor.go:146"}
{"level":"warn","timestamp":"2022-06-09T09:46:22.089Z","caller":"batchprocessor/batch_processor.go:186","message":"Sender failed","kind":"processor","name":"batch/traces","pipeline":"traces","error":"NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors"}

Environment

Tested on:

Additional context Fresh collector log after start:

2022/06/09 10:16:47 I! Change ownership to 998:999
2022/06/09 10:16:47 I! Set HOME: /home/aoc
{"level":"debug","timestamp":"2022-06-09T10:16:47.419Z","caller":"awsutil@v0.51.0/conn.go:60","message":"Using proxy address: ","kind":"exporter","name":"awsxray","proxyAddr":""}
{"level":"debug","timestamp":"2022-06-09T10:16:47.423Z","caller":"awsutil@v0.51.0/conn.go:140","message":"Fetch region from ec2 metadata","kind":"exporter","name":"awsxray","region":"us-east-1"}
{"level":"debug","timestamp":"2022-06-09T10:16:47.423Z","caller":"awsxrayexporter@v0.51.0/xray_client.go:51","message":"Using Endpoint: %s","kind":"exporter","name":"awsxray","endpoint":"https://xray.us-east-1.amazonaws.com/"}
{"level":"info","timestamp":"2022-06-09T10:16:47.423Z","caller":"builder/exporters_builder.go:255","message":"Exporter was built.","kind":"exporter","name":"awsxray"}
{"level":"debug","timestamp":"2022-06-09T10:16:47.423Z","caller":"awsutil@v0.51.0/conn.go:60","message":"Using proxy address: ","kind":"exporter","name":"awsemf","proxyAddr":""}
{"level":"debug","timestamp":"2022-06-09T10:16:47.426Z","caller":"awsutil@v0.51.0/conn.go:140","message":"Fetch region from ec2 metadata","kind":"exporter","name":"awsemf","region":"us-east-1"}
{"level":"info","timestamp":"2022-06-09T10:16:47.427Z","caller":"builder/exporters_builder.go:255","message":"Exporter was built.","kind":"exporter","name":"awsemf"}
{"level":"info","timestamp":"2022-06-09T10:16:47.427Z","caller":"builder/exporters_builder.go:217","message":"Ignoring exporter as it is not used by any pipeline","kind":"exporter","name":"logging"}
{"level":"info","timestamp":"2022-06-09T10:16:47.428Z","caller":"builder/pipelines_builder.go:224","message":"Pipeline was built.","kind":"pipeline","name":"traces"}
{"level":"info","timestamp":"2022-06-09T10:16:47.428Z","caller":"builder/pipelines_builder.go:224","message":"Pipeline was built.","kind":"pipeline","name":"metrics"}
{"level":"info","timestamp":"2022-06-09T10:16:47.428Z","caller":"awsxrayreceiver@v0.51.0/receiver.go:60","message":"Going to listen on endpoint for X-Ray segments","kind":"receiver","name":"awsxray","udp":"0.0.0.0:2000"}
{"level":"info","timestamp":"2022-06-09T10:16:47.428Z","caller":"udppoller/poller.go:109","message":"Listening on endpoint for X-Ray segments","kind":"receiver","name":"awsxray","udp":"0.0.0.0:2000"}
{"level":"info","timestamp":"2022-06-09T10:16:47.429Z","caller":"awsxrayreceiver@v0.51.0/receiver.go:72","message":"Listening on endpoint for X-Ray segments","kind":"receiver","name":"awsxray","udp":"0.0.0.0:2000"}
{"level":"debug","timestamp":"2022-06-09T10:16:47.429Z","caller":"proxy@v0.51.0/conn.go:105","message":"Unable to fetch region from ECS metadata","kind":"receiver","name":"awsxray","error":"ECS metadata endpoint is inaccessible"}
{"level":"debug","timestamp":"2022-06-09T10:16:47.431Z","caller":"proxy@v0.51.0/conn.go:113","message":"Fetched region from EC2 metadata","kind":"receiver","name":"awsxray","region":"us-east-1"}
{"level":"info","timestamp":"2022-06-09T10:16:47.432Z","caller":"builder/receivers_builder.go:226","message":"Receiver was built.","kind":"receiver","name":"awsxray","datatype":"traces"}
{"level":"info","timestamp":"2022-06-09T10:16:47.432Z","caller":"builder/receivers_builder.go:226","message":"Receiver was built.","kind":"receiver","name":"otlp","datatype":"traces"}
{"level":"info","timestamp":"2022-06-09T10:16:47.432Z","caller":"builder/receivers_builder.go:226","message":"Receiver was built.","kind":"receiver","name":"otlp","datatype":"metrics"}
{"level":"info","timestamp":"2022-06-09T10:16:47.432Z","caller":"service/telemetry.go:109","message":"Setting up own telemetry..."}
{"level":"info","timestamp":"2022-06-09T10:16:47.433Z","caller":"service/telemetry.go:129","message":"Serving Prometheus metrics","address":":8888","level":"basic","service.instance.id":"a0a0c3bd-7d88-4520-b5f3-2b3d045adb78","service.version":"latest"}
{"level":"info","timestamp":"2022-06-09T10:16:47.433Z","caller":"service/service.go:76","message":"Starting extensions..."}
{"level":"info","timestamp":"2022-06-09T10:16:47.433Z","caller":"extensions/extensions.go:38","message":"Extension is starting...","kind":"extension","name":"health_check"}
{"level":"info","timestamp":"2022-06-09T10:16:47.434Z","caller":"healthcheckextension@v0.51.0/healthcheckextension.go:44","message":"Starting health_check extension","kind":"extension","name":"health_check","config":{"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"},"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
{"level":"info","timestamp":"2022-06-09T10:16:47.434Z","caller":"extensions/extensions.go:42","message":"Extension started.","kind":"extension","name":"health_check"}
{"level":"info","timestamp":"2022-06-09T10:16:47.434Z","caller":"service/service.go:81","message":"Starting exporters..."}
{"level":"info","timestamp":"2022-06-09T10:16:47.434Z","caller":"builder/exporters_builder.go:40","message":"Exporter is starting...","kind":"exporter","name":"awsxray"}
{"level":"info","timestamp":"2022-06-09T10:16:47.435Z","caller":"builder/exporters_builder.go:48","message":"Exporter started.","kind":"exporter","name":"awsxray"}
{"level":"info","timestamp":"2022-06-09T10:16:47.435Z","caller":"builder/exporters_builder.go:40","message":"Exporter is starting...","kind":"exporter","name":"awsemf"}
{"level":"info","timestamp":"2022-06-09T10:16:47.435Z","caller":"builder/exporters_builder.go:48","message":"Exporter started.","kind":"exporter","name":"awsemf"}
{"level":"info","timestamp":"2022-06-09T10:16:47.435Z","caller":"builder/exporters_builder.go:40","message":"Exporter is starting...","kind":"exporter","name":"logging"}
{"level":"info","timestamp":"2022-06-09T10:16:47.435Z","caller":"builder/exporters_builder.go:48","message":"Exporter started.","kind":"exporter","name":"logging"}
{"level":"info","timestamp":"2022-06-09T10:16:47.435Z","caller":"service/service.go:86","message":"Starting processors..."}
{"level":"info","timestamp":"2022-06-09T10:16:47.436Z","caller":"builder/pipelines_builder.go:54","message":"Pipeline is starting...","kind":"pipeline","name":"traces"}
{"level":"info","timestamp":"2022-06-09T10:16:47.436Z","caller":"builder/pipelines_builder.go:65","message":"Pipeline is started.","kind":"pipeline","name":"traces"}
{"level":"info","timestamp":"2022-06-09T10:16:47.436Z","caller":"builder/pipelines_builder.go:54","message":"Pipeline is starting...","kind":"pipeline","name":"metrics"}
{"level":"info","timestamp":"2022-06-09T10:16:47.436Z","caller":"builder/pipelines_builder.go:65","message":"Pipeline is started.","kind":"pipeline","name":"metrics"}
{"level":"info","timestamp":"2022-06-09T10:16:47.436Z","caller":"service/service.go:91","message":"Starting receivers..."}
{"level":"info","timestamp":"2022-06-09T10:16:47.436Z","caller":"builder/receivers_builder.go:68","message":"Receiver is starting...","kind":"receiver","name":"awsxray"}
{"level":"info","timestamp":"2022-06-09T10:16:47.437Z","caller":"awsxrayreceiver@v0.51.0/receiver.go:99","message":"X-Ray TCP proxy server started","kind":"receiver","name":"awsxray"}
{"level":"info","timestamp":"2022-06-09T10:16:47.437Z","caller":"builder/receivers_builder.go:73","message":"Receiver started.","kind":"receiver","name":"awsxray"}
{"level":"info","timestamp":"2022-06-09T10:16:47.437Z","caller":"builder/receivers_builder.go:68","message":"Receiver is starting...","kind":"receiver","name":"otlp"}
{"level":"info","timestamp":"2022-06-09T10:16:47.437Z","caller":"zapgrpc/zapgrpc.go:174","message":"[core] [Server #1] Server created","grpc_log":true}
{"level":"info","timestamp":"2022-06-09T10:16:47.437Z","caller":"otlpreceiver/otlp.go:70","message":"Starting GRPC server on endpoint 0.0.0.0:4317","kind":"receiver","name":"otlp"}
{"level":"info","timestamp":"2022-06-09T10:16:47.437Z","caller":"otlpreceiver/otlp.go:88","message":"Starting HTTP server on endpoint 0.0.0.0:55681","kind":"receiver","name":"otlp"}
{"level":"info","timestamp":"2022-06-09T10:16:47.438Z","caller":"builder/receivers_builder.go:73","message":"Receiver started.","kind":"receiver","name":"otlp"}
{"level":"info","timestamp":"2022-06-09T10:16:47.438Z","caller":"healthcheck/handler.go:129","message":"Health Check state change","kind":"extension","name":"health_check","status":"ready"}
{"level":"info","timestamp":"2022-06-09T10:16:47.438Z","caller":"service/collector.go:251","message":"Starting aws-otel-collector...","Version":"v0.18.0","NumCPU":1}
{"level":"info","timestamp":"2022-06-09T10:16:47.438Z","caller":"service/collector.go:146","message":"Everything is ready. Begin running and processing data."}
{"level":"info","timestamp":"2022-06-09T10:16:47.439Z","caller":"zapgrpc/zapgrpc.go:174","message":"[core] [Server #1 ListenSocket #2] ListenSocket created","grpc_log":true}

Content of config.yaml

extensions:
 health_check:
receivers:
 otlp:
  protocols:
   grpc:
    endpoint: 0.0.0.0:4317
   http:
    endpoint: 0.0.0.0:55681
 awsxray:
  endpoint: 0.0.0.0:2000
  transport: udp
processors:
 batch/traces:
  timeout: 1s
  send_batch_size: 50
 batch/metrics:
  timeout: 60s
exporters:
 awsxray:
 awsemf:
service:
 extensions: [health_check]
 pipelines:
  traces:
   receivers: [otlp,awsxray]
   processors: [batch/traces]
   exporters: [awsxray]
  metrics:
   receivers: [otlp]
   processors: [batch/metrics]
   exporters: [awsemf]
rsmaso-aws commented 2 years ago

cc @mhausenblas

mhausenblas commented 2 years ago

As per @willarmiros:

The underlying error comes from the AWS SDK for Go. It basically means the AWS SDK for Go can’t find credentials in its typical expected places, see for example, https://github.com/aws/aws-xray-daemon/issues/59

mhausenblas commented 2 years ago

I can reproduce the issue, tried different collector configs and methods to pass credentials as well as passing in NO_PROXY=169.254.169.254 to test if it's related to the metadata service. As long as there is no IAM role attached, the NoCredentialProviders: no valid providers in chain. Deprecated error occurs.

mhausenblas commented 2 years ago

See also https://github.com/aws/aws-sdk-go/issues/2914

jeromeinsf commented 2 years ago

Given this comment https://github.com/aws/aws-sdk-go/issues/2914#issuecomment-803177408 , how do you recommend tracking the ask on the SDK side of things? @mhausenblas @Aneurysm9 Also, is a PR thinkable ?

mhausenblas commented 2 years ago

@jeromeinsf as discussed last week, internal

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

rsmaso-aws commented 2 years ago

Hi @mhausenblas, is there any updates on this issue? Thanks in advance

mhausenblas commented 2 years ago

@rsmaso-aws it's on our backlog, hopefully this quarter, Q1/2023 otherwise.

rapphil commented 2 years ago

Hi @rsmaso-aws ,

You are not able to setup the credentials with sudo AWS_REGION=us-east-1 AWS_CONFIG_FILE="/home/ubuntu/.aws/credentials" /opt/aws/aws-otel-collector/bin/aws-otel-collector-ctl -c ./config.yaml -a start or similar because aws-otel-collector-ctl is a control process and not the collector itself.

Here are a some options to solve your problem:

config=--config /opt/aws/aws-otel-collector/etc/config.yaml
AWS_SHARED_CREDENTIALS_FILE=/path/file/credentials
sudo systemctl edit aws-otel-collector

# Add the AWS_SHARED_CREDENTIALS_FILE to the override of the systemd file. ref: https://www.freedesktop.org/software/systemd/man/systemd.service.html#Command%20lines

# Example of what you should add to the override:

[Service]
Environment="AWS_SHARED_CREDENTIALS_FILE=/path/to/creds"

After you set the environment variable, you can try to stop/start the collector and the credentials should be used.

The first two options bellow will also support specifying credentials through the env vars AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN, but I recommend using AWS_SHARED_CREDENTIALS_FILE.

We will update the documentation to detail how to install collector on-premises or in ec2 without profiles.

NOTE: AWS_SHARED_CREDENTIALS_FILE is what you are looking for to set the credentials file. AWS_CONFIG_FILE is used for storing configuration profiles.

RyanW8 commented 1 year ago

We're trying to use the shared credentials approach when running the ADOT collector in EKS. We don't want to use IRSA so are using IAM credentials injected into the pod see below:

---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: adot
spec:
  image: <private-registry>/public-ecr/aws-observability/aws-otel-collector:v0.24.1
  mode: deployment
  serviceAccount: adot-collector
  podAnnotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "<vault-role>"
    vault.hashicorp.com/agent-cache-enable: "true"
    vault.hashicorp.com/agent-revoke-on-shutdown: "true"
    vault.hashicorp.com/agent-inject-secret-creds: "fake/path"
    vault.hashicorp.com/agent-inject-template-creds: |
      {{- with secret "aws-creds" }}
      [default]
      aws_access_key_id = {{index .Data "access_key" }}
      aws_secret_access_key = {{index .Data "secret_key" }}
      {{- end -}}
  env:
    - name: AWS_SHARED_CREDENTIALS_FILE
      value: "/vault/secrets/creds"
  config: |
    extensions:
      health_check:
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    exporters:
      awsxray:
        region: eu-west-1
    processors:
      batch/traces:
        timeout: 1s
        send_batch_size: 50
    service:
      extensions:
        - health_check
      pipelines:
        traces:
          receivers:
            - otlp
          processors:
            - batch/traces
          exporters:
            - awsxray

We're still not getting traces into X-Ray

rapphil commented 1 year ago

Hi @RyanW8 . did you check the logs for this workload? If credentials was the issue, you should see an error log.

Moreover, can you try to use the logging exporter to verify if your application is really generating traces?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

utahkay commented 1 year ago

I have the same problem, trying to run on EC2, following the instructions here. I'm running v0.28.0 of the aws-otel-collector.

I tried all three suggestions by @rapphil to specify credentials, but I still get the error

"error":"NoCredentialProviders: no valid providers in chain. Deprecated.

The original issue description from @rsmas-aws says

The issue does NOT appear when there is an IAM role attached to the EC2 instance

Maybe I can use an IAM role; where can I find info on what needs to be in that role?

utahkay commented 1 year ago

I was able to find a good example policy here; using an IAM role for the EC2 instance, I am able to send metrics to CloudWatch.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

lphaniKumar commented 1 year ago

I am seeing similar issue when using IAM Roles anywhere for an Onpremise system and not able to post the data to AWS CloudWatch using EMF Exporter. Is there a fix/workaround for this? @mhausenblas

rapphil commented 1 year ago

Hi @lphaniKumar

Please take a look into the updated documentation: https://aws-otel.github.io/docs/setup/on-premises#configuring-adot-collector-to-use-iam-roles-anywhere (make sure to refresh the page)

there was a miss in the documentation and setting the env var "AWS_SDK_LOAD_CONFIG=true" is necessary.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been marked as stale for 30 days with no activity.