Open mboveri opened 3 years ago
The sidecar is using Google Cloud Client Library for Golang, which in turn use a library called Application Default Credentials (ADC). ADC allows to pass credentials using GOOGLE_APPLICATION_CREDENTIALS environment variable, see here . So you can create a secret containing JSON file with credentials as a volume, mount this volume to the sidecar container and set GOOGLE_APPLICATION_CREDENTIALS environment variable inside the container
Also note that you might have to specify --stackdriver.generic.location="some-location-maybe-your-datacenter-name" --stackdriver.generic.namespace="K8S-cluster-name" parameters for the sidecar, so that metrics are created using generic_task monitored resource.
I have the JSON, but am getting stuck on the volume mounting bit, do you have an example of that I could take a look at?
Take a look at https://stackoverflow.com/questions/47021469/how-to-set-google-application-credentials-on-gke-running-through-kubernetes , there is also example in the official GCP docs as well - https://cloud.google.com/kubernetes-engine/docs/tutorials/authenticating-to-cloud-platform#step_4_import_credentials_as_a_secret
I was able to get that working but am now getting the following error in the sidecar's container logs:
level=warn ts=2020-08-26T21:17:02.122Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: timeSeries[0-199]"
level=warn ts=2020-08-26T21:16:44.686Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[10].metric.type had an invalid value of \"external.googleapis.com/prometheus/clouddriver:jvm:memory:used\": The metric type must be a URL-formatted string with a domain and non-empty path.: timeSeries[10]; Field timeSeries[11].metric.type had an invalid value of \"external.googleapis.com/prometheus/clouddriver:jvm:memory:used\": The metric type must be a URL-formatted string with a domain and non-empty path.
As well as this error:
The metric type must be a URL-formatted string with a domain and non-empty path.
For anyone else attempting to mount volumes to Prometheus, the minimum version is 8.13.13 when Volume and VolumeMounts were added - https://github.com/helm/charts/commit/ef0d749132ecfa61b2ea47ccacafeaf5cf1d3d77
I was able to get that working but am now getting the following error in the sidecar's container logs:
level=warn ts=2020-08-26T21:17:02.122Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: timeSeries[0-199]"
As I've mentioned in the comment above you need to pass --stackdriver.generic.location as a sidecar parameter to fill mandatory "location" label associated with generic_task monitored resource type.
level=warn ts=2020-08-26T21:16:44.686Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[10].metric.type had an invalid value of \"external.googleapis.com/prometheus/clouddriver:jvm:memory:used\": The metric type must be a URL-formatted string with a domain and non-empty path.: timeSeries[10]; Field timeSeries[11].metric.type had an invalid value of \"external.googleapis.com/prometheus/clouddriver:jvm:memory:used\": The metric type must be a URL-formatted string with a domain and non-empty path.
Most probably you're not specifying a --include filter as a sidecar parameter, as a result the sidecar attempts to send ALL prometheus metrics to Google Cloud Monitoring. I doubt this is what you really want, as this can be costly - see Pricing. Please consider setting up --include correspondingly.
Note that metric names in Google Cloud Monitoring must be valid URL formatted strings, and the sidecar generates metric names in external.googleapis.com/prometheus/
Looks like setting the follow did the trick. We are working on getting some better filtering, so have stopped sending for now, but we were able to get the OpenStack cluster to connect and see metrics in the Metrics explorer before disabling. `- --stackdriver.project-id={redacted}
Thanks for all your help @Dnefedkin !
We will also still need to figure out why also, we need to figure out why some metrics are getting rejected
we see errors like:
level=warn ts=2020-08-27T23:42:27.611Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: timeSeries[0-199]"
level=warn ts=2020-08-27T23:42:22.656Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = Field timeSeries[0].points[0].interval.start_time had an invalid value of \"2020-08-27T16:39:43.687-07:00\": The start time must be before the end time (2020-08-27T16:39:43.687-07:00) for the non-gauge metric 'external.googleapis.com/prometheus/container_fs_sector_writes_total'."
level=warn ts=2020-08-27T23:23:07.075Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[100].metric.type had an invalid value of \"external.googleapis.com/prometheus/gate:hystrix:isCircuitBreakerOpen\": The metric type must be a URL-formatted string with a domain and non-empty path.: timeSeries[100];
level=warn ts=2020-08-27T23:23:13.486Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: The new labels would cause the metric external.googleapis.com/prometheus/kube_deployment_labels to have over 10 labels.: timeSeries[180]"
I think at least:
level=warn ts=2020-08-27T23:23:07.075Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[100].metric.type had an invalid value of \"external.googleapis.com/prometheus/gate:hystrix:isCircuitBreakerOpen\": The metric type must be a URL-formatted string with a domain and non-empty path.: timeSeries[100];
May be related to your note about invalid URLs due to :
's though
level=warn ts=2020-08-27T23:42:22.656Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = Field timeSeries[0].points[0].interval.start_time had an invalid value of \"2020-08-27T16:39:43.687-07:00\": The start time must be before the end time (2020-08-27T16:39:43.687-07:00) for the non-gauge metric 'external.googleapis.com/prometheus/container_fs_sector_writes_total'."
container_fs_sector_writes_total sounds like a counter metric, not a gauge, so is should have start time before end time to reflect the time interval. If you want to represent this metric as a gauge, you can use static_metadata entry in the config file, see https://github.com/Stackdriver/stackdriver-prometheus-sidecar#file
level=warn ts=2020-08-27T23:23:13.486Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: The new labels would cause the metric external.googleapis.com/prometheus/kube_deployment_labels to have over 10 labels.: timeSeries[180]"
This sounds like a Google Cloud Monitoring API restriction, 10 labels maximum per time series.
Awesome, thanks @Dnefedkin !
Hello,
I have kubernetes clusters in multiple clouds (GCP, AWS, on-prem OpenStack) and would like to export all my prometheus metrics to Stackdriver. Right now, stackdriver-prometheus-sidecar does not have the ability to explicitly specify which service account credentials to use when communicating with the Google Cloud Monitoring (GCM) API. This means that the sidecar cannot function outside of GCE nodes, where Workload Identity normally provides authentication and authorization. It would be really nice if we were able to leverage the stackdriver-prometheus-sidecar to export metrics from our non-GCP Kubernetes clusters into GCM. Is it possible to add a configuration flag to the sidecar that specifies a location on disk where service account keys could be placed? That way, one could stash the service account keys in a kubernetes Secret object and mount them into the container, even on clusters outside of GCP.