fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
334 stars 143 forks source link

_connect_cloud regression in 2022.7.1 - Invalid gcloud credentials #486

Open baxen opened 2 years ago

baxen commented 2 years ago

Our workflows using gcsfs==2022.7.1 are seeing a new error connecting from google cloud environments

  File "/usr/local/lib/python3.8/site-packages/gcsfs/core.py", line 280, in __init__
    self.credentials = GoogleCredentials(project, access, token)
  File "/usr/local/lib/python3.8/site-packages/gcsfs/credentials.py", line 51, in __init__
    self.connect(method=token)
  File "/usr/local/lib/python3.8/site-packages/gcsfs/credentials.py", line 244, in connect
    self.__getattribute__("_connect_" + method)()
  File "/usr/local/lib/python3.8/site-packages/gcsfs/credentials.py", line 97, in _connect_cloud
    raise ValueError("Invalid gcloud credentials")

I am pretty sure this was introduced in https://github.com/fsspec/gcsfs/commit/42aa3ddeae19a029d1699ac5ac0b7fd33aa7d6a8

Reading through the google auth library, the valid method is first checking if self.token is None. self.token is always initialized to None, and as far as I can tell is only updated after calling .refresh. So I believe this check will always fail, despite this running in an environment where these credentials would be valid.

I could open a PR, this would probably be fixed by running a .refresh just before the check, if it is appropriate to run a blocking connection check at this time in the flow.

martindurant commented 2 years ago

I'm not sure exactly where the check would go, but if you put it in a PR, then we have something concrete to talk about..

baxen commented 2 years ago

Added a quick PR to show what I mean - this wouldn't move when the check happens but instead use a refresh to do the check. I don't see any way in the google Credentials object to check if it is valid without actually just trying to connect.

martindurant commented 2 years ago

Do you think you'll have a chance to work further on that PR and get it passing?

baxen commented 2 years ago

Updated the PR - sorry for the delay! I think this will fix the issue we are seeing, but I am not sure if it will have side effects (e.g. i see some additional complexity when the same refresh is called below in maybe_refresh, but don't have enough context to know if we need it here as well)