google-github-actions / auth

A GitHub Action for authenticating to Google Cloud.
https://cloud.google.com/iam
Apache License 2.0
953 stars 195 forks source link

TensorFlow gfile does not work via Workload Identity Federation #210

Closed lgeiger closed 2 years ago

lgeiger commented 2 years ago

TL;DR

I am trying to switch from authenticating with long lived Service Account Key JSON to Workload Identity Federation in a TensorFlow application.

To test that authentication works correctly I am simply testing the existence of a blob in my buckets:

from google.cloud import storage

print(storage.Client().get_bucket("my-bucket").blob("my_blob").exists())

This works correctly for both authenticating with a Service Account Key and with Workload Identity Federation.

However using TensorFlow gfile only works with the old Service Account Key method and fails with Workload Identity Federation:

import tensorflow as tf

print(tf.io.gfile.exists("gs://my-bucket/my_blob"))

Expected behavior

I'd expect both authentication methods to work equally well given that TensorFlow just uses the credentials from $GOOGLE_APPLICATION_CREDENTIALS.

Observed behavior

TensorFlow GFile only seems to work with service account keys and not with Workload Identity Federation.

Action YAML

name: Test

on:
  push:
    branches:
      - main
  pull_request: {}

jobs:
  test:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write

    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: 3.9
      - uses: google-github-actions/auth@v0
        with:
          workload_identity_provider: 'projects/123456789/locations/global/workloadIdentityPools/my-pool/providers/my-provider'
          service_account: 'my-service-account@my-project.iam.gserviceaccount.com'
      - run: python test_gcs.py

Log output

W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "FAILED_PRECONDITION: Unexpected content of the JSON credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
Traceback (most recent call last):
  File "/home/runner/work/test_gcs.py", line 16, in <module>
    print(tf.io.gfile.exists("gs://my-bucket/my_blob"))
  File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 288, in file_exists_v2
    _pywrap_file_io.FileExists(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{
  "error": {
    "code": 401,
    "message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
    "errors": [
      {
        "message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
        "domain": "global",
        "reason": "required",
        "locationType": "header",
        "location": "Authorization"
      }
    ]
  }
}
'
     when reading metadata of gs://my-bucket/my_blob

Additional information

No response

github-actions[bot] commented 2 years ago

Hi there @lgeiger :wave:!

Thank you for opening an issue. Our team will triage this as soon as we can. Please take a moment to review the troubleshooting steps which lists common error messages and their resolution steps.

sethvargo commented 2 years ago

Hi @lgeiger

Please open an issue in the tensorflow repository. gfile will need to add support for Workload Identity Federation. Unfortunately there is nothing we can do in this project.

lgeiger commented 2 years ago

For reference, I opened https://github.com/tensorflow/tensorflow/issues/57104 which hasn't seen any response yet.

vpipkt commented 1 year ago

@sethvargo Can you give some more detail on the analysis of how exactly this is a problem that is specific to gfile?

sethvargo commented 1 year ago

Hi @vpipkt TensorFlow GFile needs to be updated to support Workload Identity Federation supplied by Application Default Credentials. If it uses official Google Cloud client libraries under the hood, it probably needs to update to the latest version. More details:

In the past, the were only two ways to authenticate to GCP:

  1. Exported service account key JSON
  2. Machine identity (only for gcloud and workloads on GCP using the metadata server)

About 2 years ago, GCP created Workload Identity Federation, which adds a third authentication mechanism and file format. GFile does not appear to support that format.