TOMP-WG / TOMP-API

Transport Operator to Mobility-as-a-Service Provider-API development for Mobility as a Service
Apache License 2.0
96 stars 41 forks source link

[FEATURE REQUEST] Paginated responses #425

Open danielbynight opened 2 years ago

danielbynight commented 2 years ago

Is your feature request related to a problem? Please describe.

Currently, the implementation of pagination in the API specification is incomplete, as it lacks vital information in the response data. Namely, the count of the total amount of objects is missing. Other fields could be considered:

The lack of information on the total amount of records is particularly problematic. Without it, frontends won't know if they are receiving all possible records, or just the first page of records limited by a default limit. In our implementation, we're returning a 400 response if no limit is passed as a GET parameter, in order to enforce an explicit _limit query. This solution is very aggressive and not perfect, as a frontend would still not know how many pages to expect, having to discover that by subsequent requests with an offset increased by multiples of the limit until either it gets a response with either no records or less records than the limit.

Urgency

For the reasons explained in the previous paragraph, this is a major issue as it impacts the usability of implementations of the TOMP-API specification by frontends.

Describe the solution you'd like

Ideally, paginated responses would be wrapped in an object including metadata on the page being responded. In the following example, the frontend would with one request that they should expect two pages of results, one with 10 results and one with two (thus allowing to build a user-friendly GUI at no extra cost of more HTTP request):

{
  "count": 12,
  "limit": 10,
  "offset": 0,
  "data": [
    // list of objects
  ]
}

Describe alternatives you've considered

Given that the proposed solution would require a major release of the specification due to the breaking changes, I propose two additional options which would require only a minor release.

Metadata as an underscored key

The metadata on paginated responses can also be sent under one or more fields marked with a leading underscore. This is a common solution for transporting metadata in response bodies (see, for example, the HAL proposal). Such fields can be marked as "optional", thus not introducing breaking changes. Example:

{
  "_page": { // optional metadata field, required for paginated responses
    "count": 12,
    "limit": 10,
    "offset": 0,
  },
  "data": [
    // list of objects
  ]
}

Metadata in the HTTP headers

Given that headers are already traditionally used to store metadata on the server response, it's not far-fetched to also include this data there, albeit being an exotic solution for this common problem. It does allow not touching the current specification of the response body of list endpoints. Example:

pagination-count 12
pagination-limit 10
// other headers

[
  // list of objects
]

Notes

This is an out-of-scope discussion, but I'd like to see the TOMP-API evolve in the direction of requiring pagination for all list endpoints, as that is a requirement in any modern client-server interactions.

danielbynight commented 2 years ago

As an aside: "paging" can be read in the documentation. I believe that the correct English word to refer to this topic is always "pagination", but even if "paging" is also allowed, I'd recommend changing it to "pagination", as that is the keyword developers use in order to search for this topic.

tisinno commented 2 years ago

My 2cts on this topic;

TOMP is not an end-user/front-end API.

/operator/* API intent is to establish synchronization between TO and MP and should be decoupled from any front-end queries

MP will want up to date and all dynamic information continuously, not just a subset of pages, MP requires this across multiple TOs in order to support multi-modal planning and other types of combined TO functionalities

Pagination using offset+limit parameters on a realtime, dynamic dataset ("available-assets") will give either an inconsistent snapshot to MP when fetching all pages OR actually demands a complicated implementation for a TO to ensure this consistency. Pagination where the desire is to fetch all pages may actually lead to more TO runtime pressure as an MP could consider a concurrent implementation for page fetching to get faster updates.

Instead, if regionId based partitioning is not sufficient for some TOs to split the dataset sizes in responses, an optional secondary partitioning scheme can be added to the relevant endpoints.

Partitioning is better suited as it can be based on statically determined subsets (as opposed to offsets into a dynamic set like 'available-assets'). This will mean response item counts can vary a little per request which is not relevant for the stated intent of /operator/*, but the maximum can be controlled exactly and freely by TO and fetching across the entire range is still guaranteed consistent.

This additional partitioning approach can also be modeled in a cursor style manner if so desired, so that MP will find the cursor value in the response of a request and uses it to fetch the next batch. TO can decode the cursor value to determine the subset to look at for the next response. TO is completely free in deciding cursor values, so there is no need for min/max in TOMP spec. TO can also simply return all results in first request and doesnt need to bother with any cursors or paging support etc if it is not needed. Again, the important bit here is that the cursor values must be chosen to define a statically determined subset, for example using an internal asset ID. So for example no cursor value is ID 1-100, cursor value=1 means 101-200 etc, simple to implement and most importantly, guaranteed consistent.

Overall, again given the (imho) intent of this /operator/* API it should not contain any parameters that are not related to partitioning and synchronization, effectively meaning stationId, limit etc should be out from available-assets, also for /stations there should not be a spatial search capability defined. These use-cases can be addressed with separate, dedicated and above-all optional endpoints, so it remains clear what is actually supported by a TO

In general TOMP should consider extensions that allow for additional syncing mechanisms that are not polling based which would also reduce the impact of these API requirements for TO and MP and at the same time add (functional) value.

edwinvandenbelt commented 1 year ago

We're thinking about this setup right now: http://jsonapi.org/format/#fetching-pagination. How do you think about this?

"links": {
        "first": "https://...",
        "last": "https://...",
        "prev": "https://...",
        "next": "https://..."
    }
edwinvandenbelt commented 1 year ago

Another approach could be http://docs.opengeospatial.org/is/17-069r3/17-069r3.html#_requirements_class_geojson. This is still a proposal for geoJSON. It's more like the traditional Google paging.

{
  "type" : "FeatureCollection",
  "links" : [ {
    "href" : "http://data.example.com/collections/buildings/items?f=json",
    "rel" : "self",
    "type" : "application/geo+json",
    "title" : "this document"
  }, {
    "href" : "http://data.example.com/collections/buildings/items?f=html",
    "rel" : "alternate",
    "type" : "text/html",
    "title" : "this document as HTML"
  }, {
    "href" : "http://data.example.com/collections/buildings/items?f=json&offset=10&limit=10",
    "rel" : "next",
    "type" : "application/geo+json",
    "title" : "next page"
  } ],
  "timeStamp" : "2018-04-03T14:52:23Z",
  "numberMatched" : 123,
  "numberReturned" : 10,
  "features" : [
edwinvandenbelt commented 1 year ago

we'll take this approach:

"links": {
        "prev": "https://...",
        "next": "https://..."
    }

The 'first' and 'last' will be removed, they don't have added value in this case

tisinno commented 1 year ago

Looks good. From my point of view only the next link is important, even numberMatched or any form of total is unnecessary and may cause additional complexity on large, partitioned datasets.

It would be good to include an explanation that the best approach of implementing this, is to ensure a statically determined subset result on each queried link.

edwinvandenbelt commented 7 months ago

MUST do. In v2, we can handle this with HAL (HATEAOS). https://en.wikipedia.org/wiki/Hypertext_Application_Language

edwinvandenbelt commented 7 months ago

Is only applicable in /bookings/{id}/legs/{lid}/available-assets and the journal-entries request. HAL is used. (see https://github.com/TOMP-WG/TOMP-API/tree/transmodel-v1/TOMP-API.yaml)