fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
331 stars 142 forks source link

Error when listing large directory with versions=True #590

Closed rlamy closed 9 months ago

rlamy commented 9 months ago

In some cases, listing a large directory with versions=True fails with a cryptic error. This only happens when results are paginated.

Here's an example:

>>> import gcsfs
>>> fs = gcsfs.GCSFileSystem(version_aware=True)
>>> ll = fs.ls("gcs://dql-50k-laion-files/000003/", versions=True)
_request non-retriable exception: Invalid argument., 400
Traceback (most recent call last):
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Invalid argument., 400
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ronan/.pyenv/versions/dql-3.10/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Users/ronan/.pyenv/versions/dql-3.10/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/ronan/.pyenv/versions/dql-3.10/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/core.py", line 998, in _ls
    for entry in await self._list_objects(
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/core.py", line 564, in _list_objects
    items, prefixes = await self._do_list_objects(
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/core.py", line 638, in _do_list_objects
    return await self._sequential_list_objects_helper(
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/core.py", line 753, in _sequential_list_objects_helper
    page = await self._call(
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/core.py", line 437, in _call
    status, headers, info, contents = await self._request(
  File "/Users/ronan/.pyenv/versions/dql-3.10/lib/python3.10/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/retry.py", line 158, in retry_request
    raise e
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/ronan/devel/iterative/gcsfs/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Invalid argument., 400