aai-institute / lakefs-spec

An fsspec implementation for the lakeFS project
http://lakefs-spec.org/
Apache License 2.0
39 stars 4 forks source link

Add depagination helper #178

Closed nicholasjng closed 10 months ago

nicholasjng commented 10 months ago

This is about a currently silent bug in most of our response handling code: API responses are currently not depaginated anywhere except in fs.ls(), meaning that not all resources are returned from GET requests if the amount of resources is very large. Since we are dealing with mostly empty repos (local ephemeral instances), we do not currently observe the effects of this.

Example: A repository has 110 tags, client_helpers.list_tags() returns at most 100 results -> we are missing at least 10 tags in the response.

lakeFS even mentions a solution here (https://docs.lakefs.io/integrations/python.html#usage-examples):

# source: https://docs.lakefs.io/integrations/python.html#usage-examples
def pagination_helper(page_fetcher, **kwargs):
    """Helper function to iterate over paginated results"""
    while True:
        resp = page_fetcher(**kwargs)
        yield from resp.results
        if not resp.pagination.has_more:
            break
        kwargs['after'] = resp.pagination.next_offset

page_fetcher is the API that's being called, for example objects_api.list_object. This works out, but is entirely untyped, so we have to think a bit about how to type this properly.

Questions:

What's a little bit unfortunate is that the pydantic response models (resp in the above code) do not share a common base class, so we need to roll our own.