ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

feat: add total_runs in response of GET /runs endpoint #183

Open Mifrill opened 2 years ago

Mifrill commented 2 years ago

The property total_runs allows to implement the pagination with the references for each page, includes the last page and showing number of pages.

Related to https://github.com/ga4gh/workflow-execution-service-schemas/issues/159#issuecomment-1184525759

patmagee commented 2 years ago

@Mifrill thanks for the PR! I would be really interested to know your use case and what you are trying to solve with the total_runs (outside of the conversation on the linked issue). I do not think this would add TOO much of a burden on engines, although I will say in some cases it is simply not possible to know the total number of runs without performing an extremely expensive operation. So I wonder if this is a required field, or if maybe there is a generic way to represent optional summary metadata in the RunListResponse.

elasticsearch has the concept of "hits" which may not really be applicable here since this is not a search context, but maybe "metadata" or some other field:


{
   "runs": [],
   "metadata": {
     "total_runs": 5000
   }
}
cjllanwarne commented 2 years ago

+1 this seems pretty important for neat pagination

Mifrill commented 2 years ago

@patmagee thanks for the review!

what you are trying to solve with the total_runs

The total count required to implement "orthodox" pagination, by dividing the total count by page size we can provide a direct link for any page. Rather than in the current implementation only possible to go next page or the previous page.

it is simply not possible to know the total number of runs without performing an extremely expensive operation

In my understanding, the implementation of the counter is the responsibility of each particular service, hence the possible performance issues should be handled on the specific service side. There are many possibilities to avoid performing expensive operations like counter field implementation or last-document-index etc. it fully depends on a particular service's inner logic, so it's not related to the schema.

if maybe there is a generic way to represent optional summary metadata in the RunListResponse.

I don't think it is a good way to go because if we will follow that logic, the filed next_page_token also should be moved to some metadata wrapper field: https://github.com/Mifrill/workflow-execution-service-schemas/blob/ea050992fd0d95a9624024bf2f14977a905ff9c7/openapi/components/schemas/RunListResponse.yaml#L9

patmagee commented 2 years ago

@Mifrill thanks for the clarification. I think your suggestion makes a lot of sense, and I wonder if there is a way to frame this PR in a forward looking way and consider what we would need to improve pagination in general (starting with a total_runs). There is this open issue, but it has not really gotten any steam lately. I know trs (@denis-yuen) has a specific approach that the settled on which is externally documented, so I wonder if we should likewise investigate what paging schemas are there in the wild and what fits with WES in a holistic way

patmagee commented 1 year ago

@Mifrill would you mind updating your changes against the current develop branch? We merged all of the openapi components into a single file to better work with our tooling

Mifrill commented 1 year ago

@patmagee thanks for the ping 👍. Updated

uniqueg commented 1 year ago

On this subject, I would like to draw attention to the draft (see link in next comment by @jaeddy) of the upcoming GA4GH pagination style guide (being drafted to address https://github.com/ga4gh/TASC/issues/29).

The style guide is not very prescriptive at the moment (I think it's rather a survey of currently used pagination solutions across the GA4GH API landscape with a discussion of pros and cons) and is definitely open for feedback.

jaeddy commented 1 year ago

FYI @uniqueg - I believe this is the latest version of the pagination guide from @andrewyatz and @mamanambiya.

andrewyatz commented 1 year ago

FYI @uniqueg - I believe this is the latest version of the pagination guide from @andrewyatz and @mamanambiya.

Yes this is the right one

uniqueg commented 1 year ago

Thanks for pointing that out @jaeddy 🙏

Mifrill commented 6 months ago

Hey @patmagee, @cjllanwarne just a gentle reminder, please have a look at this PR when you get a time 🙏