Closed dbalabka closed 5 months ago
It seems there was an attempt to fix the problem in #486, but it was unsuccessful.
To refresh the token, we are calling the following sequence of methods on each request:
GCSFileSystem.def _request()
GCSFileSystem._get_headers()
GCSFileSystem.credentials.apply()
GCSFileSystem.credentials.maybe_refresh()
My suspicion is that GoogleCredentials.credentials == None
is not obligatory when we deal with anon
auth. Hence, we have to keep an explicit token type identifier.
It happens only if provide token
string w/o refresh_token
which is expected behavior:
def get_storage_options():
# It works, but I don't know if it's the best way to do it
#
# Also, seems like need to specify the scope, although worked before.
# Source: https://stackoverflow.com/questions/60401040/getting-invalid-scope-when-attempting-to-obtain-a-refresh-token-via-the-google-a
credentials, _ = google.auth.default(
scopes=["https://www.googleapis.com/auth/cloud-platform"]
)
if not credentials.valid:
credentials.refresh(Request())
return {
"token": credentials.token,
}
pd.read_parquet(..., storage_options=get_storage_options())
Relying on any other auth method fixes the problem.
The problem is related to #32. Unfortunately, it is not obvious to implement token refresh when a library is used by another library (dask -> pyarrow -> fsspec -> gcsfs). It would be amazing to implement more prone token refresh.
Here is an exception that we get when the job runs longer than an hour: gcsfs version is
2023.12.2.post1
HttpError: Invalid Credentials, 401