Stackdriver / stackdriver-prometheus-sidecar

A sidecar for the Prometheus server that can send metrics to Stackdriver.
https://cloud.google.com/monitoring/kubernetes-engine/prometheus
Apache License 2.0
121 stars 43 forks source link

permission denied despite creating service account #142

Closed philips closed 5 years ago

philips commented 5 years ago

I followed these steps to setup my Prometheus + Stackdriver stack.

level=info ts=2019-08-04T04:21:28.50604042Z caller=main.go:296 msg="Starting Stackdriver Prometheus sidecar" version="(version=HEAD, branch=master, revision=453838cff46ee8a17f7675696a97256475bb39e7)"
level=info ts=2019-08-04T04:21:28.506422485Z caller=main.go:297 build_context="(go=go1.12, user=kbuilder@kokoro-gcp-ubuntu-prod-1535194210, date=20190520-14:47:15)"
level=info ts=2019-08-04T04:21:28.506537834Z caller=main.go:298 host_details="(Linux 4.14.127+ #1 SMP Tue Jun 18 23:08:40 PDT 2019 x86_64 prometheus-prometheus-0 (none))"
level=info ts=2019-08-04T04:21:28.506674208Z caller=main.go:299 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-08-04T04:21:28.512748856Z caller=main.go:564 msg="Web server started"
level=info ts=2019-08-04T04:21:28.516468444Z caller=main.go:545 msg="Stackdriver client started"
level=info ts=2019-08-04T04:22:31.518073511Z caller=manager.go:153 component="Prometheus reader" msg="Starting Prometheus reader..."
level=info ts=2019-08-04T04:22:31.531530003Z caller=manager.go:215 component="Prometheus reader" msg="reached first record after start offset" start_offset=0 skipped_records=0
level=warn ts=2019-08-04T04:22:31.631923445Z caller=queue_manager.go:546 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = PermissionDenied desc = Permission monitoring.timeSeries.create denied (or the resource may not exist)."
philips commented 5 years ago

On a hunch I enabled the GKE Workload Identity service on my cluster and now I am getting

level=info ts=2019-08-04T04:51:33.271865429Z caller=main.go:296 msg="Starting Stackdriver Prometheus sidecar" version="(version=HEAD, branch=master, revision=453838cff46ee8a17f7675696a97256475bb39e7)"
level=info ts=2019-08-04T04:51:33.272237734Z caller=main.go:297 build_context="(go=go1.12, user=kbuilder@kokoro-gcp-ubuntu-prod-1535194210, date=20190520-14:47:15)"
level=info ts=2019-08-04T04:51:33.272354147Z caller=main.go:298 host_details="(Linux 4.14.127+ #1 SMP Tue Jun 18 23:08:40 PDT 2019 x86_64 prometheus-prometheus-0 (none))"
level=info ts=2019-08-04T04:51:33.272482549Z caller=main.go:299 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-08-04T04:51:33.285847047Z caller=main.go:564 msg="Web server started"
level=info ts=2019-08-04T04:51:33.286903601Z caller=main.go:545 msg="Stackdriver client started"
level=info ts=2019-08-04T04:52:36.290058215Z caller=manager.go:153 component="Prometheus reader" msg="Starting Prometheus reader..."
level=info ts=2019-08-04T04:52:36.319815836Z caller=manager.go:215 component="Prometheus reader" msg="reached first record after start offset" start_offset=0 skipped_records=0
level=warn ts=2019-08-04T04:52:37.962598185Z caller=queue_manager.go:546 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = Unauthenticated desc = Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."
jkohen commented 5 years ago

@philips thanks for the report and for the extra information about Workload Identity. In both cases I see credential errors in the logs you posted.

The first error indicates that the service account doesn't have the right permissions. See the instructions here on how to set it up correctly: https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster#use_least_privilege_sa

The second error indicates that the Stackdriver Prometheus integration cannot find credentials using Application Default Credentials. If the link above doesn't help you solve this issue, please see https://cloud.google.com/docs/authentication/production

I also see that Stackdriver may use the node's service account while GKE Workload Identity Service is in beta, but I'm not sure whether it applies to the Prometheus integration, so something to keep in mind: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#limitations

philips commented 5 years ago

@jkohen Thanks for your help.

With fresh eyes this morning I noticed the project and the cluster name were inversed. ::facepalm::

After I fixed that everything works as expected.

I will close this but it would be really cool if there were a way for the application to know the difference between having incorrect permissions and incorrect configuration. Failing that it might be good to have a debug FAQ that addresses an IAM misconfiguration looks identical to typos of the flags.

Thanks!

hixichen commented 3 years ago

refer: GKE workload identity

export GCP_PROJECT=my-project
export GCP_SA=gke-prometheus
export K8S_SA=prometheus
export K8S_NS=prometheus

gcloud iam service-accounts create ${GCP_SA} --display-name=${GCP_SA}

gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:${GCP_PROJECT}.svc.id.goog[${K8S_NS}/${K8S_SA}]" \
  ${GCP_SA}@${GCP_PROJECT}.iam.gserviceaccount.com

gcloud projects add-iam-policy-binding ${GCP_PROJECT} \
  --member "serviceAccount:${GCP_SA}@${GCP_PROJECT}.iam.gserviceaccount.com" \
  --role roles/monitoring.metricWriter

gcloud projects add-iam-policy-binding ${GCP_PROJECT} \
  --member "serviceAccount:${GCP_SA}@${GCP_PROJECT}.iam.gserviceaccount.com" \
  --role roles/monitoring.viewer

gcloud projects add-iam-policy-binding ${GCP_PROJECT} \
  --member "serviceAccount:${GCP_SA}@${GCP_PROJECT}.iam.gserviceaccount.com" \
  --role roles/logging.logWriter

gcloud projects add-iam-policy-binding ${GCP_PROJECT} \
  --member "serviceAccount:${GCP_SA}@${GCP_PROJECT}.iam.gserviceaccount.com" \
  --role roles/stackdriver.resourceMetadata.writer

kubectl annotate serviceaccount ${K8S_SA} \
  iam.gke.io/gcp-service-account="${GCP_SA}@${GCP_PROJECT}.iam.gserviceaccount.com" \
  -n ${K8S_NS}