DIRACGrid / diracx

The neXt DIRAC generation
GNU General Public License v3.0
8 stars 19 forks source link

feat(jobs): pagination #243

Closed aldbr closed 2 months ago

aldbr commented 3 months ago

Here I provide a first implementation of the pagination mechanism for the jobs, mostly based on https://github.com/DIRACGrid/diracx/pull/6.

As I explored different examples, it became clear to me that there is no one-size-fits-all solution for pagination. Instead, various implementation possibilities exist, each offering unique approaches at different stages.

Pagination Strategies:

I considered two primary pagination strategies:

My opinion: we should prioritize simplicity for the time being. Our primary requirement is to retrieve the last 10, 100, or 1,000 jobs. Occasionally, it's useful to jump to a different page, such as when checking if a particular issue has occurred before. While there may be some cases where we need to fetch a large number of jobs at once, such instances are rare. Therefore, minor inconsistencies should not pose a significant problem. I would rather choose the page-based pagination.

Metadata Conveyance:

Conveying metadata to clients is essential for effective navigation as we return partial results. Common methods include Web Linking, Content-Range headers, and embedding metadata within the JSON response.

Link: <https://dirac/api/jobs?page=2&per_page=100>; rel="prev", <https://dirac/api/jobs?page=4&per_page=100>; rel="next", <https://dirac/api/jobs?page=515&per_page=100>; rel="last", <https://dirac/api/jobs?page=1&per_page=100>; rel="first"
Content-Range: <unit> <first item>-<last item>/<total>
Content-Range: jobs 1-10/100

My opinion: I initially implemented the Content-Range approach based on https://github.com/DIRACGrid/diracx/pull/6. However, I believe that web linking could also be beneficial as it would fit perfectly with the pagination parameters. Including metadata directly in the JSON is straightforward, but it would require additional parsing.

aldbr commented 3 months ago

After trying to link the pages to diracx-web, I realized that page-based pagination combined with the Content-Range header was enough for such a use case.

Now for the agents needing to fetch a large number of items while guaranteeing consistency, we could still tweak the per-page parameter by setting a very large number. There are a few cons of course: it would take a few seconds to fetch a large number of items and we would need to make sure that per-page is large enough to cover all the needed items.

UPDATE: As a counter-argument, now that we can sort items using any parameter in any order, pagination is not as essential as it is within the current DIRAC implementation. Instead of going to the last page to examine old items, we could just sort the items differently (and thus rely on a cursor-based pagination).

fstagni commented 3 months ago

I would rather choose the page-based pagination.

On a first look, I agree.

chrisburr commented 3 months ago

I think we should use page-based pagination for most things but have the option of having a reliable pagination for critical operations. Perhaps we could make it simplier by having the option of:

For inspiration:

aldbr commented 3 months ago

I think we should use page-based pagination for most things but have the option of having a reliable pagination for critical operations. Perhaps we could make it simplier by having the option of:

  • passing after instead of page
  • after is only supported if sorting by the primary key of the table to keep the implementation simple

@chrisburr what you are describing is a (simple) cursor-based pagination if I understand correctly. Do you suggest we should implement both methods (at least for the critical operations)?

the recommended way of iterating programatically would be with Web Linking so the use of after becomes an implementation detail

Indeed, if we go with cursor-based pagination, this would be a nice way of iterating through the previous and next pages without including the cursor within the json result.

chaen commented 3 months ago

@chrisburr what would be the use case ? I can hardly imagine any other use case than having the first N elements, so basically the first page with a given length. Also, we have the problem that we may not have a stable order, so I don't think it is achievable. If we have such a need for a very specific case, then we may want to add that behavior where it's needed. But having it in a generic way seems out of reach and not worth it to me

chaen commented 3 months ago

@chrisburr and I had a chat and I think we are on the same page: go for the page based approach

aldbr commented 3 months ago

Alright, then you can start reviewing the PR