This is about a currently silent bug in most of our response handling code: API responses are currently not depaginated anywhere except in fs.ls(), meaning that not all resources are returned from GET requests if the amount of resources is very large. Since we are dealing with mostly empty repos (local ephemeral instances), we do not currently observe the effects of this.
Example: A repository has 110 tags, client_helpers.list_tags() returns at most 100 results -> we are missing at least 10 tags in the response.
# source: https://docs.lakefs.io/integrations/python.html#usage-examples
def pagination_helper(page_fetcher, **kwargs):
"""Helper function to iterate over paginated results"""
while True:
resp = page_fetcher(**kwargs)
yield from resp.results
if not resp.pagination.has_more:
break
kwargs['after'] = resp.pagination.next_offset
page_fetcher is the API that's being called, for example objects_api.list_object. This works out, but is entirely untyped, so we have to think a bit about how to type this properly.
Questions:
[ ] Can we type the API?
[ ] Can we type the kwargs to dynamically accept a Python API spec?
[ ] Can we define a generic that enables automatic typing of the return object?
What's a little bit unfortunate is that the pydantic response models (resp in the above code) do not share a common base class, so we need to roll our own.
This is about a currently silent bug in most of our response handling code: API responses are currently not depaginated anywhere except in
fs.ls()
, meaning that not all resources are returned fromGET
requests if the amount of resources is very large. Since we are dealing with mostly empty repos (local ephemeral instances), we do not currently observe the effects of this.Example: A repository has 110 tags,
client_helpers.list_tags()
returns at most 100 results -> we are missing at least 10 tags in the response.lakeFS even mentions a solution here (https://docs.lakefs.io/integrations/python.html#usage-examples):
page_fetcher
is the API that's being called, for exampleobjects_api.list_object
. This works out, but is entirely untyped, so we have to think a bit about how to type this properly.Questions:
What's a little bit unfortunate is that the pydantic response models (
resp
in the above code) do not share a common base class, so we need to roll our own.