Closed yarikoptic closed 2 years ago
@yarikoptic Exactly which endpoints do you want this implemented for, and should those endpoints always use this? Currently, the client paginates the following:
/dandisets/
/dandisets/:id/versions/
/dandisets/:id/versions/:id/assets/
, with & without a path
parameterLet's do it at the generic level of pagination handling in paginate
.
Threading is fast, so let's do threaded fetching of pages >=2 if overall there is more than 1 page.
Please add add a check that the next
for the first returned result is in the expected form ({base_url}?page=2
), and if not -- issue a WARNING and if not os.environ.get("DANDI_PAGINATION_DISABLE_FALLBACK", False)
-- continue serially with current implementation, and otherwise
raise AssertionError(f'API changed the pagination strategy. "next" URL {fetched_next_url} is different from expected {next_url}')
or alike.
Then in CI we could export DANDI_PAGINATION_DISABLE_FALLBACK in one of the matrix runs so we detect when/if API changes how pagination is done, and clients would still perform reliably even if pagination changes.
@yarikoptic I implemented this, and then I wrote a test that creates a Dandiset with a bunch of assets and paginates through them. In order to ensure that there would be more than five pages even after https://github.com/dandi/dandi-archive/pull/1107 is merged, I set the number of assets to 617. Unfortunately, uploading that many assets (even when they're tiny) takes too long for the test suite; the shortest amount of time I could get this down to was about 2 and a half minutes, and that was by foregoing upload()
in favor of just calling LocalAsset.upload()
directly (and parallelizing the uploads in a ThreadPoolExecutor
only shaved off a couple seconds).
(Threadedly paginating through the assets afterwards takes about 15 seconds, and that was using an earlier number of 1234 assets.)
can you upload 5 assets, and set page_size to 1 to get 5 pages?
@yarikoptic get_assets()
doesn't have a page_size
option; only paginate()
does. Should all the methods that call paginate()
gain this option?
@yarikoptic Alternatively, per_page
(and lookahead
?) could be instance attributes of the client instead of method arguments.
Since it is such a low level detail, I am leaning toward client instance attribute. May be even not bother interfacing it in its constructor, just have it defined at the level of a class and be allowed to be overloaded for an instance. But at large -- up to you.
:rocket: Issue was released in 0.40.0
:rocket:
eyeballing the output of snippet in https://github.com/dandi/dandi-cli/issues/1018#issuecomment-1137227203 shows wait times after each page. That could be avoided. Pagination output returns
count
in the initial response. our pagination is simple as in specify the page number (no "next" token). So it should be trivial toceil(count/page_size)
pagination is similar for other end points (e.g.
/dandisets/{versions__dandiset__pk}/versions/{versions__version}/assets/paths/
) and I think overall generic. So I think we can implement it generally for any pagination requests to submit multiple pages requests in parallel and speed up interactions with the archive this way.