GoogleCloudPlatform / opentelemetry-operations-go

Apache License 2.0
134 stars 103 forks source link

googlemanagedprometheusexporter auth using metedataserver url #526

Closed avijitsarkar123 closed 1 year ago

avijitsarkar123 commented 1 year ago

We want to have metadata server based auth support for googlemanagedprometheusexporter, currently it only work with application_default_credentials.json through the GOOGLE_APPLICATION_CREDENTIALS but we want it to fallback to metedata server URL as below:

http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token

This pattern works for us in Cloud Monitoring/BigQuery Grafana datasources which uses this code to follow a hierarchy starting from GOOGLE_APPLICATION_CREDENTIALS to metedata URL.

We want to have the same feature for googlemanagedprometheusexporter

dashpole commented 1 year ago

Marking as bug, since this should be how it works today. Can you provide your googlemanagedprometheus exporter config?

avijitsarkar123 commented 1 year ago

Here you go...

  prometheus:
    config:
      global:
        scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
        evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
        # scrape_timeout is set to the global default (10s).

      scrape_configs:
        - job_name: 'spring boot scrape'
          metrics_path: '/actuator/prometheus'
          scrape_interval: 5s
          static_configs:
            - targets: ['host.docker.internal:8080']

processors:
    batch:
        # batch metrics before sending to reduce API usage
        send_batch_max_size: 200
        send_batch_size: 200
        timeout: 5s
    memory_limiter:
        # drop metrics if memory usage gets too high
        check_interval: 1s
        limit_percentage: 65
        spike_limit_percentage: 20

exporters:
  logging:
    loglevel: info
  googlemanagedprometheus:
    project: ct-gcp-sre-monitorin-dev-09qb

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, googlemanagedprometheus]

What will be the env var names for the below resource attributes?

cluster:,
job:
location:,
namespace:,
instance:

Also this is how I am running the otel collector locally...

docker run --network=host \
  -v $(pwd)/otel-config-gmp.yaml:/etc/otelcol/config.yaml \
  -v /Users/avijitsarkar/.config/gcloud/application_default_credentials.json:/tmp/application_default_credentials.json \
  --env GOOGLE_APPLICATION_CREDENTIALS=/tmp/application_default_credentials.json \
  --env GOOGLE_CLOUD_PROJECT=ct-gcp-sre-monitorin-dev-09qb \
  otel/opentelemetry-collector-contrib /otelcontribcol --config=/etc/otelcol/config.yaml
dashpole commented 1 year ago

Does it work if you remove project: ct-gcp-sre-monitorin-dev-09qb (are the credentials from the same project as you are trying to send the metrics to?). It might help me figure out what is going wrong

avijitsarkar123 commented 1 year ago

So I have access to multiple projects, I get the auth creds locally using below commands... and then volume mounting the creds file from .config/gcloud/application_default_credentials.json to the container...

gcloud auth login --account avijit_sarkar@apple.com
gcloud auth application-default login --account avijit_sarkar@apple.com
avijitsarkar123 commented 1 year ago

I removed the GOOGLE_CLOUD_PROJECT=ct-gcp-sre-monitorin-dev-09qb env var and still same error...

2022-11-03T19:59:16.665Z    info    exporterhelper/queued_retry.go:427  Exporting failed. Will retry the request after interval.    {"kind": "exporter", "data_type": "metrics", "name": "googlemanagedprometheus", "error": "failed to export time series to GCM: rpc error: code = Internal desc = One or more TimeSeries could not be written: Internal error encountered. Please retry after a few seconds. If internal errors persist, contact support at https://cloud.google.com/support/docs.: prometheus_target{location:,cluster:,job:spring boot scrape,namespace:,instance:host.docker.internal:8080} timeSeries[0-57]: prometheus.googleapis.com/jvm_gc_pause_seconds_max/gauge{action:end of minor GC,cause:G1 Evacuation Pause}", "interval": "5.173252611s"}
dashpole commented 1 year ago

Thanks

avijitsarkar123 commented 1 year ago

@dashpole - If you can tell me the ENV vars (name of the vars) I have to set for the below ones, I can give it a try...

cluster:,
job:
location:,
namespace:,
instance:
dashpole commented 1 year ago

See https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlemanagedprometheusexporter#resource-attribute-handling

It is basically:

If you are using the prometheus receiver, job + instance are already correctly set. The aws resource detector should be able to detect location if you don't want to hard-code that

avijitsarkar123 commented 1 year ago

okay.. so I tried this one to confirm and its working, I am able to see the metrics on GMP side using promql

docker run --network=host \
  -v $(pwd)/otel-config-gmp.yaml:/etc/otelcol/config.yaml \
  -v /Users/avijitsarkar/.config/gcloud/application_default_credentials.json:/tmp/application_default_credentials.json \
  --env GOOGLE_APPLICATION_CREDENTIALS=/tmp/application_default_credentials.json \
  --env GOOGLE_CLOUD_PROJECT=ct-gcp-sre-monitorin-dev-09qb \
  --env OTEL_RESOURCE_ATTRIBUTES=cluster=test,location=us-west1,namespace=gmp-poc \
  otel/opentelemetry-collector-contrib /otelcontribcol --config=/etc/otelcol/config.yaml
dashpole commented 1 year ago

Cool. Its best to use one of the cloud-provider based resource detectors for setting cloud.availability_zone at least (e.g. gcp, ec2, etc.) from https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor#supported-detectors.

I'll look into the auth issue.

dashpole commented 1 year ago

Based on https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/8c6aae170851bb7eb4bc64f348861e6692bf7fde/exporter/collector/traces.go#L55, it looks like we might just be using the credentials to set the project id, but might not be using the actual credentials.

dashpole commented 1 year ago

I wonder if we should be passing those to the metric and trace clients via https://pkg.go.dev/google.golang.org/api/option#WithCredentials. But they clearly are using ADC somehow already...

avijitsarkar123 commented 1 year ago

Looking at this, it seems its already using FindDefaultCredentials that uses json file and also fallback to metadata server, I think this needs to be used by the clients... https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/main/exporter/collector/traces.go#L57

dashpole commented 1 year ago

Looks like we need to follow this: https://github.com/googleapis/google-cloud-go#authorization

avijitsarkar123 commented 1 year ago

@dashpole - thanks for taking care of it so fast. So is it going to be included in next version of otel/opentelemetry-collector-contrib? And do you how soon they gets released?

avijitsarkar123 commented 1 year ago

@dashpole - I was testing the changes using the image otel/opentelemetry-collector-contrib:0.66.0, I removed the env var GOOGLE_APPLICATION_CREDENTIALS and was hoping it will default to http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token to get the auth but it didn't work and getting the below error:

Error: cannot build pipelines: failed to create "googlemanagedprometheus" exporter, in pipeline "metrics": google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
dashpole commented 1 year ago

Looking at google.FindDefaultCredentials(), it will fall back to using a compute token source if metadata.OnGCE() returns true. The compute token source defaults to instance/service-accounts/default/token, which is what you are looking for.

metadata.OnGCE() returns true if the GCE_METADATA_HOST env var is set, or if some other conditions are met. Setting the env var is probably the easiest way to get the credentials to be fetched from that address.