NASA-PDS / registry-api

Web API service for the PDS Registry, providing the implementation of the PDS Search API (https://github.com/nasa-pds/pds-api) for the PDS Registry.
https://nasa-pds.github.io/pds-api
Apache License 2.0
3 stars 5 forks source link

As a user, I want the x-total-count header returned by API responses #565

Open tloubrieu-jpl opened 2 years ago

tloubrieu-jpl commented 2 years ago

💪 Motivation

...so that users who like this standard way of having the total of results matching their requests independantly from the pagination constraint.

📖 Additional Details

⚖️ Acceptance Criteria

Given When I perform Then I expect

⚙️ Engineering Details

tloubrieu-jpl commented 2 years ago

According to this discussion, it would make sense to have the x-total-count header when the response format cannot carry an equivalent attribute (e.g. CSV), see https://github.com/zalando/restful-api-guidelines/issues/116

tloubrieu-jpl commented 2 years ago

From: "Niessner, Albert F (US 398F)" albert.f.niessner@jpl.nasa.gov Date: Wednesday, December 22, 2021 at 2:17 PM To: "Karpenko, Yevgen (US 398F-Affiliate)" yevgen.l.karpenko@jpl.nasa.gov, "Loubrieu, Thomas G (US 398F)" thomas.g.loubrieu@jpl.nasa.gov Cc: "Padams, Jordan H (US 398A)" jordan.h.padams@jpl.nasa.gov Subject: Re: x-total-count

Just to be absolutely clear, I do not care if we keep or pitch x-total-count (or whatever). All I can do is advise.

  1. The documentation on swagger clearly states that hits is the total number of matches to the query. If we want x-total-count then we should ditch summary.hits because hits must be calculated every time independent of x-total-count. Plus it will lead to two different answers then arguments over which is right.
  2. In simple requests like /bundles, /collections, and /products, hits is given directly from elastic search, as Eugene has already pointed out several times, meaning worrying about resource costs is irrelevant.
  3. In more complex requests like /bundles/{lidvid}/products etc, then the cost of computing the total can be expensive. In the case given, have to find all collections of a bundle then all products of all the collections. The products can be paged somewhat by loading just the necessary collections and their products until the start and limit are covered. Cannot just skip collections because the number of products in a collection is not a fixed value meaning the list of products has to read just to know how many there are. Can cheapen the resource requirements by just loading lidvids, which is what the code does, then loads all other fields for just the products of concern. As requested early on, the total is always counted just via lidvids to keep resources minimal. Make x-total-count kind of meaningless because this would be hard to shortcut but not impossible.
  4. Why implement an internet standard at high cost for feature nobody is going to want to use anyway? For instance, added types like text/html because did not want to make end user change their application types in their browsers. If you want to use this fancy x-total-count or latest RFC stuff then that is going to require more user finesse then changing application types in a browser. I guess one could make the argument of curl users but even there it makes little impact because all the work is already being done just for pagination and summing it up for an integer is pretty simple and cheap.
  5. Neither header total not summary total should exist. Ignore these ideas until an end user complains that they need to know the total instead of just paging until less than limit is returned -- common way of saying at the end of the list, file, stream, etc.

-- Al Niessner 818.354.0859


| dS | >= 0

From: Karpenko, Yevgen (US 398F-Affiliate) Sent: Wednesday, December 22, 2021 10:21:47 AM To: Loubrieu, Thomas G (US 398F); Niessner, Albert F (US 398F) Cc: Padams, Jordan H (US 398A) Subject: RE: x-total-count

Rfc2616 is deprecated. The latest is https://datatracker.ietf.org/doc/html/rfc7233 rfc7233 datatracker.ietf.org Hypertext Transfer Protocol (HTTP/1.1): Range Requests (RFC )

It describes range requests (pagination)

-Eugene

From: Karpenko, Yevgen (US 398F-Affiliate) Sent: Wednesday, December 22, 2021 10:12 AM To: Loubrieu, Thomas G (US 398F) Thomas.G.Loubrieu@jpl.nasa.gov; Niessner, Albert F (US 398F) Albert.F.Niessner@jpl.nasa.gov Cc: Padams, Jordan H (US 398A) Jordan.H.Padams@jpl.nasa.gov Subject: RE: x-total-count

There are many RFCs and discussions about using headers for pagination / total number of records J

Seems like “X-“ prefix is deprecated (see https://www.rfc-editor.org/rfc/rfc6648)

There is a “Content-Range” header (see https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16 and https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.12)

Also for partial content (a page) it is recommended to return code 206 instead of 200. (See https://www.rfc-editor.org/rfc/rfc7233#page-10)

For example,

206 Partial Content Accept-Ranges: members Content-Range: members 0-20/100

200 OK Accept-Ranges: members Content-Range: members 0-20/20

-Eugene

From: Loubrieu, Thomas G (US 398F) Sent: Wednesday, December 22, 2021 8:18 AM To: Niessner, Albert F (US 398F) albert.f.niessner@jpl.nasa.gov Cc: Padams, Jordan H (US 398A) jordan.h.padams@jpl.nasa.gov; Karpenko, Yevgen (US 398F-Affiliate) yevgen.l.karpenko@jpl.nasa.gov Subject: Re: x-total-count

+1 for x-total-count: that will be useful to store the total count when the response format can not as in the case of the CSV response format. See discussion zalando/restful-api-guidelines#116

From: "Loubrieu, Thomas G (US 398F)" thomas.g.loubrieu@jpl.nasa.gov Date: Wednesday, December 22, 2021 at 10:59 AM To: "Niessner, Albert F (US 398F)" albert.f.niessner@jpl.nasa.gov Cc: "Padams, Jordan H (US 398A)" jordan.h.padams@jpl.nasa.gov, "Karpenko, Yevgen (US 398F-Affiliate)" yevgen.l.karpenko@jpl.nasa.gov Subject: x-total-count

Hi Al,

I was not able to find our previous discussion (on slack ? email ? github ?) on x-total-count header, but quickly looking on the web, it sounds like it is a standard way of having the total count of results matching the user requests independently from the pagination constraints.

So my question for you is, what do you think is the easiest to do:

  1. Remove the header in the specification and only keep the hit attribute in the summary (within 1 month). And possibly, later on, if requested by a user, add again the header (in 6 months from now or 3 years)
  2. Fully implement now (within 1 month) the x-count-total in the header as a synonyms for the hits attribute in the summary for all responses which return multiple entities.

Of course the schedule proposed is only just to give an idea of the timescale, there is no hard commitment on that (unless Jordan give us one 😉)

I feel like we are not far from being able to do option 2 since you already implement that for the hits attribute, but I would like you opinion on that.

Thanks,

Thomas