fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
334 stars 143 forks source link

Issues when using identity_pool.Credentials for connecting GCSFileSystem #595

Closed walteree1 closed 9 months ago

walteree1 commented 10 months ago

I am trying to use the authentication method called workload identity federation to access to some buckets. I already used this with another lib and this is working but we need to use GCSFileSystem for mayor reasons. I am using the following code:

from google.auth import identity_pool
from fsspec import filesystem
import gcsfs

#some hidden variables #############

credential_template =  '''{{
                       "type": "external_account",
                       "audience": "//iam.googleapis.com/projects/{project_number}/locations/global/workloadIdentityPools/{workload_pool}/providers/{workload_provider}",
                       "subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
                       "token_url": "https://sts.googleapis.com/v1/token",
                       "credential_source": {{
                           "file": "token_id",
                           "format": 
                           {{
                              "type": "json",
                              "subject_token_field_name": "id_token"
                            }}                           
                       }},
                       "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/{service_account}:generateAccessToken"
                   }}'''

cred_alex_json = credential_template.format(project_number=alex_project_number,
                                                workload_pool=alex_workload_pool,
                                                workload_provider=alex_workload_provider,
                                                service_account=alex_service_account)

alex_credentials = identity_pool.Credentials.from_info(json.loads(cred_alex_json))
fs = gcsfs.GCSFileSystem(token=alex_credentials)  ######### failed
folders = fs.ls('tee_alex_salary_input')

the error is the following

Traceback (most recent call last):
  File "/Users/walvara/repositories/testing/helpful_scripts.py/hola.py", line 158, in <module>
    process()
  File "/Users/walvara/repositories/testing/helpful_scripts.py/hola.py", line 81, in process
    folders = fs.ls('tee_alex_salary_input')
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/core.py", line 998, in _ls
    for entry in await self._list_objects(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/core.py", line 564, in _list_objects
    items, prefixes = await self._do_list_objects(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/core.py", line 638, in _do_list_objects
    return await self._sequential_list_objects_helper(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/core.py", line 735, in _sequential_list_objects_helper
    page = await self._call(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/core.py", line 437, in _call
    status, headers, info, contents = await self._request(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/retry.py", line 153, in retry_request
    raise e
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/core.py", line 421, in _request
    headers=self._get_headers(headers),
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/core.py", line 400, in _get_headers
    self.credentials.apply(out)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/credentials.py", line 187, in apply
    self.maybe_refresh()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gcsfs/credentials.py", line 182, in maybe_refresh
    self.credentials.refresh(req)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/google/auth/external_account.py", line 364, in refresh
    self._impersonated_credentials.refresh(request)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/google/auth/impersonated_credentials.py", line 250, in refresh
    self._update_token(request)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/google/auth/impersonated_credentials.py", line 279, in _update_token
    self.token, self.expiry = _make_iam_token_request(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/google/auth/impersonated_credentials.py", line 100, in _make_iam_token_request
    raise exceptions.RefreshError(_REFRESH_ERROR, response_body)
google.auth.exceptions.RefreshError: ('Unable to acquire impersonated credentials', '{\n  "error": {\n    "code": 400,\n    "message": "Request contains an invalid argument.",\n    "status": "INVALID_ARGUMENT"\n  }\n}\n')

Version: Python 3.10.11 OS: Mac and Windows

walteree1 commented 9 months ago

the fix for this is easy, you need to add an scope where the credential are being created

alex_credentials = identity_pool.Credentials.from_info(json.loads(cred_alex_json)).with_scopes(['https://www.googleapis.com/auth/cloud-platform'])
martindurant commented 9 months ago

Is it easy to query cred objects and add at default scopes if none are present?