apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.31k stars 678 forks source link

Support GCP Workload Identity Federation #3797

Open Samrose-Ahmed opened 1 year ago

Samrose-Ahmed commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I am accessing GCP resources from AWS using GCP Workload Identity Federation.

Describe the solution you'd like

Be able to access GCP resources from AWS using GCP GCP Workload Identity Federation using object_store.

https://cloud.google.com/docs/authentication/provide-credentials-adc#wlif

Describe alternatives you've considered

Is there a way to export my workload identity credentials to a form object_store can understand similar to AWS STS GetSessionToken (my knowledge of GCP is more limited)?

Additional context

{
    "audience": "//iam.googleapis.com/projects/111111534588/locations/global/workloadIdentityPools/abc",
    "credential_source": {
      "environment_id": "id123",
      "regional_cred_verification_url": "https://sts.{region}.amazonaws.com?Action=GetCallerIdentity&Version=2011-06-15"
    },
    "service_account_impersonation": {
        "token_lifetime_seconds": 3600
    },
    "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/acct@acb123.iam.gserviceaccount.com:generateAccessToken",
    "subject_token_type": "urn:ietf:params:aws:token-type:aws4_request",
    "token_url": "https://sts.googleapis.com/v1/token",
    "type": "external_account"
}
Samrose-Ahmed commented 1 year ago

I need this so happy to contribute it if there's no way to get around it.

tustvold commented 1 year ago

I don't believe there currently is support for this, but I would be happy to review a PR that added support for it. :+1:

FWIW @winding-lines filed https://github.com/apache/arrow-rs/pull/3532 which used an external gcp_auth crate. Typically we have tried to keep the dependency tree down, and so went with https://github.com/apache/arrow-rs/pull/3541 instead, but looking into the gcp_auth crate it doesn't appear to support the external_account credential source either...

https://google.aip.dev/auth/4110 appears to be the authoritative docs on ApplicationDefaultCredentials, with https://google.aip.dev/auth/4117 documenting the external_account flow. This appears to have special case logic for the different sources, e.g. AWS, Azure. Ideally this would reuse the existing auth logic we have for those systems...

Alternatively if you can find a well-supported upstream crate that supports this, I wouldn't object to an optional dependency on it.

gianarb commented 7 months ago

Hello! I am writing here to double check if the issue I am working on is similar to this one or if I am just doing something wrong since my lack of knowledge when it comes to GCP.

I enabled GCP support to my application that uses datafusion (previously I was using AWS and local storage), everything works fine locally when I use the APPLICATION_CREDENTIALS environment variable but in production my workload runs on GCP autopilot so my plan was to use the suggested workload identity to provide access to GCP Object Storage and my expectation is that the token acquisition should work without any configuration (from a datafusion point of view)

https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to

But it fails:

Error performing token request: response error \"Unable to generate access token; IAM returned 400 Bad Request: Invalid form of account ID serviceAccount:<>.iam.gserviceaccount.com. Should be [Gaia ID |Email |Unique ID |] of the account

So I am wondering if I don't know how to properly configure the object store builder or if it is an unsupported authentication method.

Thanks

tustvold commented 7 months ago

No this is covering a different kind of credential federation for workloads running outside of GCP. That error is coming from the GCP metadata server, and might indicate some sort of misconfiguration on your part, in particular the IAM role binding

gianarb commented 7 months ago

Yeah I have no idea which one! unfortunely but if I can't figure it out I will open my own issue.

Thanks