CityofSantaMonica / mds-provider

Python tools for working with MDS Provider data
https://github.com/openmobilityfoundation/mobility-data-specification
MIT License
18 stars 20 forks source link

break paging when next url contains data outside of requested time window #57

Closed yoimnik closed 5 years ago

yoimnik commented 5 years ago

for provider APIs that use paging, the next_url will contain data outside of the reuqested time window

https://github.com/CityOfLosAngeles/mobility-data-specification/tree/0.2.x/provider#pagination

according to the spec, next_url is null for the last page of data. it's possible that a client is requesting a time window in the past (ie. 12/14 - 12/15 data)

yoimnik commented 5 years ago

@thekaveman requesting review

thekaveman commented 5 years ago

This seems to assume paging is always done this particular way. The whole point of using next_url and the like is to allow providers to decide how they want to page their data, so clients don't have to interpret URL patterns etc.

I would argue it is not in spec for a response to contain a next_url pointing to data outside of the requested window. E.g. from the trips query params:

  • start_time: filters for trips where start_time occurs at or after the given time
  • end_time: filters for trips where end_time occurs at or before the given time

This has been further refined and clarified in the pending 0.3.0 work on MDS.

yoimnik commented 5 years ago

according to the spec, there is no query parameter for a pagination token or cursor token while returning the status changes data.

therefore, in order to achieve this:

I would argue it is not in spec for a response to contain a next_url pointing to data outside of the requested window.

this would require the service to store state, and i think there are better ways to implement pagination APIs.

for now, i think it's reasonable for next_url to have the next page of data, and it's up to the client to decide whether or not they want to call the next_url. you mention:

so clients don't have to interpret URL patterns etc.

but this is the whole purpose of the spec, and the purpose of having only start_time and end_time query parameters. this allows the client what to expect the URLs will be, and so they can stop going to the next url if it's outside of the time window they're requesting.

i think there's 2 options here:

  1. we keep the spec the way it is, and make the client look at the next url to see if it contains data outside of the requested time window
  2. we change the spec so that it there's an added, required query parameter for a pagination token, so the server can know in a stateless manner what the end of the requested window is

(2) would require 10 providers to change their mds API implementation, and (1) would require this change to the mds-provider library, and any other consumers of the mds apis (which i don't think there are many of)

thekaveman commented 5 years ago

so clients don't have to interpret URL patterns etc.

but this is the whole purpose of the spec, and the purpose of having only start_time and end_time query parameters. this allows the client what to expect the URLs will be, and so they can stop going to the next url if it's outside of the time window they're requesting.

No. The purpose of the start_time and end_time query parameters, and the definition of paging as being done via the next, prev, last, etc. parameters -- is for a client to request data for a given window of time across many different providers, and not have to worry about each individual implementation.

Remember, paging is a decision for each provider server; whether to do it at all, when to do it, and how to do it. MDS APIs allow a client to continually GET <next url>, but allow the provider servers to craft whatever URL patterns they wish for these URLs. Provider servers are free to choose how they wish to implement paging and all the rest - using whatever token, cursor, windowing scheme, etc. they like.

The converse, where a client has to interpret a URL pattern, is what you are suggesting. That a client has to know that a given provider server is doing time-windowed paging, using these exact querystring parameters, etc. And based on that knowledge, interpret some business logic ("oh the data I got is actually outside the window I requested"). And that is just one provider server -- as you mention, there are approx 10, each free to do paging in their own way.

The spec is pretty clear: client requests some mix of start_time and end_time, provider server responds with data that fits that window, paged or not paged however they wish.