Open Mifrill opened 2 years ago
@Mifrill thanks for the PR! I would be really interested to know your use case and what you are trying to solve with the total_runs
(outside of the conversation on the linked issue). I do not think this would add TOO much of a burden on engines, although I will say in some cases it is simply not possible to know the total number of runs without performing an extremely expensive operation. So I wonder if this is a required field, or if maybe there is a generic way to represent optional summary metadata in the RunListResponse
.
elasticsearch has the concept of "hits"
which may not really be applicable here since this is not a search context, but maybe "metadata"
or some other field:
{
"runs": [],
"metadata": {
"total_runs": 5000
}
}
+1 this seems pretty important for neat pagination
@patmagee thanks for the review!
what you are trying to solve with the total_runs
The total count required to implement "orthodox" pagination, by dividing the total count by page size we can provide a direct link for any page. Rather than in the current implementation only possible to go next page or the previous page.
it is simply not possible to know the total number of runs without performing an extremely expensive operation
In my understanding, the implementation of the counter is the responsibility of each particular service, hence the possible performance issues should be handled on the specific service side. There are many possibilities to avoid performing expensive operations like counter field
implementation or last-document-index etc. it fully depends on a particular service's inner logic, so it's not related to the schema.
if maybe there is a generic way to represent optional summary metadata in the RunListResponse.
I don't think it is a good way to go because if we will follow that logic, the filed next_page_token
also should be moved to some metadata wrapper field:
https://github.com/Mifrill/workflow-execution-service-schemas/blob/ea050992fd0d95a9624024bf2f14977a905ff9c7/openapi/components/schemas/RunListResponse.yaml#L9
@Mifrill thanks for the clarification. I think your suggestion makes a lot of sense, and I wonder if there is a way to frame this PR in a forward looking way and consider what we would need to improve pagination in general (starting with a total_runs). There is this open issue, but it has not really gotten any steam lately. I know trs (@denis-yuen) has a specific approach that the settled on which is externally documented, so I wonder if we should likewise investigate what paging schemas are there in the wild and what fits with WES in a holistic way
@Mifrill would you mind updating your changes against the current develop branch? We merged all of the openapi components into a single file to better work with our tooling
@patmagee thanks for the ping 👍. Updated
On this subject, I would like to draw attention to the draft (see link in next comment by @jaeddy) of the upcoming GA4GH pagination style guide (being drafted to address https://github.com/ga4gh/TASC/issues/29).
The style guide is not very prescriptive at the moment (I think it's rather a survey of currently used pagination solutions across the GA4GH API landscape with a discussion of pros and cons) and is definitely open for feedback.
FYI @uniqueg - I believe this is the latest version of the pagination guide from @andrewyatz and @mamanambiya.
FYI @uniqueg - I believe this is the latest version of the pagination guide from @andrewyatz and @mamanambiya.
Yes this is the right one
Thanks for pointing that out @jaeddy 🙏
Hey @patmagee, @cjllanwarne just a gentle reminder, please have a look at this PR when you get a time 🙏
The property
total_runs
allows to implement the pagination with the references for each page, includes the last page and showing number of pages.Related to https://github.com/ga4gh/workflow-execution-service-schemas/issues/159#issuecomment-1184525759