fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
328 stars 142 forks source link

google.cloud.storage allows access but gcsfs does not #395

Open amine-aboufirass opened 3 years ago

amine-aboufirass commented 3 years ago

I would like to connect to a bucket on google cloud storage using gcsfs. So far I have only been using the native google.cloud module but it turns out I actually need file like objects for a certain application so had to switch over.

My GOOGLE_APPLICATION_CREDENTIALS environment variable points to a JSON file on my local filesystem. Using google.cloud, I am able to access a bucket in cloud storage with no issues. However using gcsfs I am not. Here's some code to show what I mean:

from google.cloud import storage
import gcsfs
import google.auth

storage_client = storage.Client()
bucket = storage_client.bucket(...)
blob = bucket.blob(f'data/audio_wav/test.wav')
bts = blob.download_as_bytes()

credentials, _ = google.auth.default()
fs = gcsfs.GCSFileSystem(project=..., token=credentials)
folders = fs.ls('data') #throws error

As commented, the very last line throws what appears to be an authentication error:

google.auth.exceptions.RefreshError: ('invalid_scope: Invalid OAuth scope or ID token audience provided.', {'error': 'invalid_scope', 'error_description': 'Invalid OAuth scope or ID token audience provided.'})

I would think that if google.cloud.storage accepts my credentials, then so should gcsfs. Why do I get the above error, and how can I fix it?

Thanks.

martindurant commented 3 years ago

Perhaps you have only read rights, and need to pass access="read_only" to GCSFileSystem?

For token, you might want to pass the actual path to the auth JSON file, or use token="google_default". I'm not sure exactly what google.auth.default() returns, but we are after a google.auth.credentials.Credentials object.

davidxia commented 1 year ago

Does this library use the GOOGLE_APPLICATION_CREDENTIALS env var? I set that value to the path of my application_default_credentials.json and have GCSFileSystem(project=MY_PROJECT), but I get the error gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401.

GCSFileSystem(project=MY_PROJECT, token=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")) works which makes me suspect this library doesn't respect GOOGLE_APPLICATION_CREDENTIALS.

I'm using the latest gcsfs 2022.8.2.

akmorrow13 commented 1 year ago

I am having a similar issue. When using the path to application_default_credentials.json, I can access data:

# works
fs = gcsfs.GCSFileSystem(project=PROJECT, token=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")) 

However, I am unable to use oauth2 Credentials with gcsfs, although these credentials work with google.cloud.storage:

# fails with invalid token_id error.
from google.oauth2 import service_account
service_account_info = json.load(os.getenv("GOOGLE_APPLICATION_CREDENTIALS"))
credentials = service_account.Credentials.from_service_account_info( service_account_info)
scope = ["read_only"]
creds = credentials.with_scopes(scope)
danielgafni commented 1 month ago

I can confirm this currently works on 2024.5.0 with GOOGLE_APPLICATION_CREDENTIALS pointing to a service_account.json file.