fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
346 stars 146 forks source link

GCSMap: error first time command is run but not the second #117

Open rabernat opened 6 years ago

rabernat commented 6 years ago

I'm trying to access data from a public GCS bucket using gcsfs.GCSMap.

The first time I run this code

import gcsfs
gcmap = gcsfs.GCSMap('pangeo-data/pyqg/barotropic/beta_00.zarr')

it fails with this error:

_call exception: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')
Traceback (most recent call last):
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/gcsfs/core.py", line 458, in _call
    r = meth(self.base + path, params=kwargs, json=json)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/requests/sessions.py", line 537, in get
    return self.request('GET', url, **kwargs)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/auth/transport/requests.py", line 183, in request
    self._auth_request, method, url, request_headers)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/auth/credentials.py", line 121, in before_request
    self.refresh(request)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/credentials.py", line 117, in refresh
    self._client_secret))
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 191, in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 110, in _token_endpoint_request
    _handle_error_response(response_body)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 60, in _handle_error_response
    error_details, response_body)
google.auth.exceptions.RefreshError: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')
_call exception: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')
Traceback (most recent call last):
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/gcsfs/core.py", line 458, in _call
    r = meth(self.base + path, params=kwargs, json=json)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/requests/sessions.py", line 537, in get
    return self.request('GET', url, **kwargs)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/auth/transport/requests.py", line 183, in request
    self._auth_request, method, url, request_headers)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/auth/credentials.py", line 121, in before_request
    self.refresh(request)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/credentials.py", line 117, in refresh
    self._client_secret))
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 191, in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 110, in _token_endpoint_request
    _handle_error_response(response_body)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 60, in _handle_error_response
    error_details, response_body)
google.auth.exceptions.RefreshError: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')
_call exception: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')
Traceback (most recent call last):
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/gcsfs/core.py", line 458, in _call
    r = meth(self.base + path, params=kwargs, json=json)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/requests/sessions.py", line 537, in get
    return self.request('GET', url, **kwargs)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/auth/transport/requests.py", line 183, in request
    self._auth_request, method, url, request_headers)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/auth/credentials.py", line 121, in before_request
    self.refresh(request)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/credentials.py", line 117, in refresh
    self._client_secret))
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 191, in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 110, in _token_endpoint_request
    _handle_error_response(response_body)
  File "/Users/rpa/miniconda3/envs/geo_scipy/lib/python3.6/site-packages/google/oauth2/_client.py", line 60, in _handle_error_response
    error_details, response_body)
google.auth.exceptions.RefreshError: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')

If I run the exact same line again, it works. This is confusing. I remember a similar thing happened on a pangeo binder environment.

I'm using gcsfs version 0.1.2 installed from conda forge.

martindurant commented 6 years ago

If it's public, then token='anon' would be faster and easier. Nevertheless, it'd be good to figure out what's happening here. Is it consistently first fail, then pass on second attempt? Which auth method should be working here?

rabernat commented 6 years ago

It is always consistent. The first time I run the command, it errors. The second time, it works. Restart kernel, same sequence always occurs.

What I dislike about token='anon' is that it requires an extra line:

import gcsfs
gcs = gcsfs.GCSFileSystem(project, token='anon')
gcmap = gcsfs.GCSMap('pangeo-data/pyqg/barotropic/beta_00.zarr', gcs=gcs)

I wish GCSMap itself were able to figure out that it should retry the request in anon mode. It evidently does this the second time you run the command. Could that logic be moved inside the first request instead?

rabernat commented 6 years ago

More generally, I feel like anonymous access to public buckets is perhaps the most common use case. So maybe this should be the default?

martindurant commented 6 years ago

It is puzzling - the default call goes through the various method, test whether auth was successful, and falls eventually back to anon. Apparently this is failing, but maybe only after connect() (by which time the instance has been stored as a singleton).

You may find interesting that this PR makes GCSFS compatible with fsspec, so that you can do

gcmap = fsspec.filesystem('gcs', token='anon').get_mapper('mdtemp')
or
gcmap = fsspec.get_mapper('gcs://mdtemp', token='anon')

in one line. That's very much WIP, of course.

martindurant commented 6 years ago

To diagnose the original problem, it would be worthwhile running gcsfs.GCSMap('pangeo-data/pyqg/barotropic/beta_00.zarr'), which fails (or just gcsfs.GCSFileSystem(), which probably doesn't fail) and checking the attributes of GCSFileSystem._singleton[0] such as "token", "method".