GoogleCloudPlatform / esp-v2

A service proxy that provides API management capabilities using Google Service Infrastructure.
https://cloud.google.com/endpoints/
Apache License 2.0
269 stars 168 forks source link

Tracing permissions issues when service_account_key specified #688

Open TheSpy opened 2 years ago

TheSpy commented 2 years ago

Hello, I am having issues in GKE environment with workload identity enabled. image: gcr.io/endpoints-release/endpoints-runtime:2.34.0

I am using a provided service account through --service_account_key parameter which has owner permissions in a project

Endpoints reporting works fine however tracing keeps throwing errors BatchWriteSpans failed (1 spans, 769 bytes): PERMISSION_DENIED: The caller does not have permission

I have tried querying metadata server and it lists workload identity service accounts

image

Seems like reporting works through specified service account, however tracing use workload identity. Is that an expected behavior?

My expectation is: when a service account key is defined, both endpoints reporting and tracing works through the same specified service account.

TheSpy commented 2 years ago

Adding --non_gcp flag helped. Since I am running endpoints-runtime in GCP environment is that the right way to setup?

qiwzhang commented 2 years ago

Currently, flag --service_account_key doesn't apply the OpenCensus tracing API calls. It only applies to ServiceControl API calls. That is the problem.

My suggestion is to setup GKE workload identify correctly and not to use flag --service_account_key.

Here is the steps on how to set it up

The flag --non_gcp is not the right approach, It still call GKE metadata server to get credential.

TheSpy commented 2 years ago

Thank you for clarification. For my specific case it is not possible, because workload identity service account is enabled on a pod level and not all containers running inside a pod supports workload identity service accounts yet. As a temporary solution is it safe to use --non_gcp flag in a GCP environment?

qiwzhang commented 2 years ago

The flag --non_gcp is pretty simple, it prevents ESPv2 to call GCP metadata sever to get info it needs. As long as you provide these info, all features are working fine, it is ok to use it.

For example, you need to provide --tracing_project_id for CloudESF to send the trace to that project. For tracing, it is the only place using this flag.

The flag --non_gcp doesn't change its way of fetching credential to call StackDriver service. By default, it is calling GCP metadata server to get credential. So I am surprised it works with the --non_gcp flag.

I think underneath it is using grpc client, If you set environment variable APPLICATION_DEFAULT_CREDENTIAL to your key path, it will use it:


export GOOGLE_APPLICATION_CREDENTIALS=your-key-path
``
TheSpy commented 2 years ago

Ah yes, you are correct. I have been experimenting and forgot to tell that I have GOOGLE_APPLICATION_CREDENTIALS environment variable set together with --non_gcp flag Thanks!

qiwzhang commented 2 years ago

Cool, I believe it is the GOOGLE_APPLICATION_CREDENTIALS that make it work, not the flag --non_gcp.

TheSpy commented 2 years ago

Just an observation: when --non_gcp flag is set together with --tracing_project_id, a trace property is not added to the log item (trace property is the one which is described here https://github.com/GoogleCloudPlatform/esp-v2/issues/431) As I understand it is not added because project id is missing and my assumption was that tracing_project_id could be used in case metadata info not available. Is that the right assumption?

qiwzhang commented 2 years ago

@nareddyt do you know?

qiwzhang commented 2 years ago

I guess we could use producer_project_id which is the project you deployed your service config to when doing gcloud endpoints service deploy ...

nareddyt commented 2 years ago

You are correct @TheSpy. It is because we have two separate project IDs - tracing project ID and deployment project ID.

Trace property is filled into access log here: https://github.com/GoogleCloudPlatform/esp-v2/blob/f71267f79d229e9bf8139125f2ebb467d41feb94/src/api_proxy/service_control/request_builder.cc#L834

We make use of the deployment project ID, which we retrieve from metadata server: https://github.com/GoogleCloudPlatform/esp-v2/blob/a98723a9663f5d405bb3ce4768bc9983a1e1439a/src/api_proxy/service_control/request_info.h#L265

We never propagate --tracing_project_id down to this field

qiwzhang commented 2 years ago

We should send tracing_project_id to service_control filter to generating trace_id in the log.

lvl99 commented 1 year ago

I've got this same issue myself, but using ESPv2 with Cloud Run. Latest image I'm using is gcr.io/endpoints-release/endpoints-runtime-serverless:2.39.0.

I deploy via gcloud run deploy {...} --service-account {...}. I've granted the permission cloudtrace.traces.patch to my service account, but still seem to have the same BatchWriteSpans failed (1 spans, 769 bytes): PERMISSION_DENIED: The caller does not have permission error coming up in my logs. It doesn't tell me exactly what permission is required or needed.

nareddyt commented 1 year ago

Hi @lvl99 , what flags are you passing to ESPv2 on Cloud Run?

ESPv2 should auto-detect the service account you specified in gcloud run deploy and use it to publish traces. No other ESPv2 config is required (i.e. no need to specify flags like tracing_project_id). Just making sure that is clear.

I've granted the permission cloudtrace.traces.patch to my service account

Can you double check this? BTW you should grant a role to the service account, not a permission. Did you grant the Cloud Trace Agent role?

nareddyt commented 1 year ago

One more question: Is the service account above from the same project that you deploy ESPv2 to? Or is the service account from a different project?

lvl99 commented 1 year ago

Thanks for your reply @nareddyt

I've created a custom role which has assigned the permission cloudtrace.traces.patch to it. The custom role is assigned to the service account I've configured for my Cloud Run instance.

All roles and service accounts are within the same project as the Cloud Run instance as well.

The flags I use to pass to ESPv2 are:

ENV ESPv2_ARGS ^++^--cors_preset=basic

That's about it. I'll try working in --enable-debug as well to see if it gives me any further info.

nareddyt commented 1 year ago

Thanks for the info. That is very odd, as far as I can tell, everything is set up correctly. This is the first time I've seen permission issues for Cloud Trace.

To debug this further, can you deploy with --enable_debug, and then share the full ESPv2 application logs (from startup to PERMISSION_DENIED error)? Feel free to email the logs to nareddyt@google.com

Unfortunately other than the logs, I can't think of other ways to debug this.