OneBusAway / onebusaway-application-modules

The core OneBusAway application suite.
https://github.com/OneBusAway/onebusaway-application-modules/wiki
Other
205 stars 132 forks source link

Reduce references output of stops-for-route API #122

Open drabell opened 9 years ago

drabell commented 9 years ago

Enhanced functionality w/potential size optimization

Existing functionality: The following sample query pertinent to the Bus (Route) M1 served by MTA_NYCT return a compact stops list for both directions (0 and 1) in Xml format: http://bustime.mta.info/api/where/stops-for-route/MTA%20NYCT_M1.xml?includePolylines=false&includeReferences=false&version=2&key={key} Enhancement requested To add additional query parameter (e.g. includeStopsData =true) forcing server to include additional stops data, like shown in the following example:

<stop>
  <lat>40.73064</lat>
  <lon>-73.99044</lon>
  <direction>N</direction>
  <name>4 AV/E 9 ST</name>
  <code>400001</code>
  <locationType>0</locationType>
  <wheelchairBoarding>UNKNOWN</wheelchairBoarding>
</stop>

Also, system should be capable to process the request just by Route ID (like M1 or Q60) without explicit Agency ID added to the query parameter, like shown in sample web query below: http://bustime.mta.info/api/where/stops-for-route/M1.xml?includePolylines=false&includeReferences=false&version=2&key={key}

Size optimization can be achieved by using XML Attributes instead of Elements for the stop nodes, like the following:

<stop lat="40.73064" lon="-73.99044" direction="N" name="4 AV/E 9 ST" 
code="400001"  locationType="0" wheelchairBoarding="UNKNOWN">
</stop>

Removing redundancy can further improve size optimization. Currently the server response to the aforementioned query includes 3 essentially identical Groups: All stops, stops for Direction 1 and for Direction 2. Instead, the response can be modified having just one all stops list (for particular Route) with direction/position info encoded as in the following sample:

<stop lat="40.73064" lon="-73.99044" direction="N" name="4 AV/E 9 ST" 
code="400001"  locationType="0" wheelchairBoarding="UNKNOWN">
<position order="0">1</position>
<position order="1">62</position>
</stop>

Thanks and regards,

barbeau commented 9 years ago

Try setting includeReferences=true. That should include the details of the stops in the references elements. On Feb 26, 2015 8:37 PM, "Dr. Alexander Bell" notifications@github.com wrote:

Enhanced functionality requested

Existing functionality: The following sample query pertinent to the Bus (Route) M1 served by _MTANYCT return a compact stops list for both directions (0 and 1) in Xml format: _http://bustime.mta.info/api/where/stops-for-route/MTA%20NYCT_M1.xml?includePolylines=false&includeReferences=false&version=2&key={key} http://bustime.mta.info/api/where/stops-for-route/MTA%20NYCT_M1.xml?includePolylines=false&includeReferences=false&version=2&key=%7Bkey%7D_ Enhancement requested To add additional query parameter (e.g. includeStopsData =true) forcing server to include additional stops data, like shown in the following example:

40.73064 -73.99044 N 4 AV/E 9 ST 400001 0 UNKNOWN

Thanks and regards,

— Reply to this email directly or view it on GitHub https://github.com/OneBusAway/onebusaway-application-modules/issues/122.

drabell commented 9 years ago

Thanks for your comments. 1). I intentionally set 'includeReferences=false' to minimize the response dataset; otherwise it returns the Xml structure with a huge overhead referencing all routes; apparently that reference section is out of context because the very idea of this request is to get all stops for just one particular route (e.g. just for Q60, or M1, etc.). 2). It would be also nice if the modified query will work without explicit reference to the Agency (either MTA NYCT or MTABC): it's a One-to-One relationship between Routes and Agencies, so by having the Route ID passed as parameter the system should be able to implicitly pull the right Agency on server side. Best regards,

barbeau commented 9 years ago

@drabell Thanks for comments and feedback!

apparently that reference section is out of context because the very idea of this request is to get all stops for just one particular route (e.g. just for Q60, or M1, etc.).

I believe the list of route references returned should be the route elements for all routes serving the stops in the payload - and, since many stops on the M1 serve more than one route, you're seeing a long list of routes.

I think its probably useful to define some kind of exclude filter, along with includeReferences=true, to prevent returning a large number of reference elements that you're not interested in. For example, to solve your particular use case (you want stop details, but not routes), the parameters could be includeReferences=true&excludeRouteReferences=true - or, some type of bitwise field to combine different type of exclusion filters in the same parameter. Would this satisfy your use case?

it's a One-to-One relationship between Routes and Agencies, so by having the Route ID passed as parameter the system should be able to implicitly pull the right Agency on server side.

This may be true in NYC (for now), but it isn't necessarily true in all OneBusAway deployments. For example, in Tampa we have multiple agencies in OBA, and two have a "Route 1".

Also, for size optimization I would recommend to use XML Attributes instead of Elements for the stop nodes

Or, you can just use the JSON response ;). That's what we use in the OBA iOS and Android mobile apps.

drabell commented 9 years ago

@barbeau , Many thanks for your prompt response and rather thoughtful comments/suggestion. I have amended the original post with some content from comments thread in order to keep all derived issues/suggestions in one place (for better case integrity). My additional comments follow:

a). I would agree that implementing query parameter includeReferences like bitwise field may provide the required functionality: for example, includeReferences =00 will mean neither Routes list, nor stops details; includeReferences =01 will mean no Routes list, but all stops w/details; includeReferences =11 will return a full set of ref data corresponding to the current includeReferences=true.

b). System in general should be capable to resolve the issue with multiple Agencies serving the same Route (e.g. Route 1) because the stops list MUST be identical (otherwise it's an utmost weird situation from commuting perspective that the same Bus makes different stops depends on some other factors not clearly articulated to commuters). In other words, the Route ID must have a unique set of stops in chosen direction (0 or 1) regardless of Agency serving the Route (and, of course, the RouteID must be unique for the selected Metro area).

c). JSON vs. XML w/attributes. JSON is a valid alternative, but XML is still a big or biggest player (at least, most folks out there know what this abbreviation stands for, unlike the other piece of techno-jargon ;-).

Please let me know if these requirements/enhancements are feasible. Kind regards, Alexander

laidig commented 9 years ago

One of my annoyances with the OBA API is that references, as @barbeau points out, the response contains plenty of references that are not relevant to the API call parameters itself. This is why I prompted this issue by another forum ;)

Dr Bell asked for information on MTA NYCT_M1, so why does it return information about other routes/agencies?

drabell commented 9 years ago

@laidig Many thanks for this clarification! This, indeed, is a core issue in a context of entire discussion topic/thread. Kind regards,

barbeau commented 9 years ago

In other words, the Route ID must have a unique set of stops in chosen direction (0 or 1) regardless of Agency serving the Route (and, of course, the RouteID must be unique for the selected Metro area).

In the case of OBA Tampa, the two different Route 1s actually serve completely different geographic areas. HART (which serves Hillsborough County) has a Route 1 in Tampa, and PSTA (which serves Pinellas Count) has their Route 1 in Pinellas County. It makes sense to host both agencies in the same OBA instance, both to reduce infrastructure overhead and because there is connectivity between the two agencies. I believe Puget Sound has similar examples, and other regional OBA deployments as well. Bus Time is actually a special case of OBA re-branded for a single city.

includeReferences like bitwise field may provide the required functionality: for example, includeReferences =00 will mean neither Routes list, nor stops details; includeReferences =01 will mean no Routes list, but all stops w/details; includeReferences =11 will return a full set of ref data corresponding to the current includeReferences=true

Yes, that's similar to what I was thinking. I think we would actually need 5 bits, though, one for each of the following references element:

  1. agencies
  2. routes
  3. situations (i.e., alerts)
  4. stops
  5. trips

For example, see this arrivals-and-departures-for-stop response.

Not all of them are used for all API calls, but I think it makes sense to keep the input consistent across API endpoints. Unused bit input for a particular method would just be ignored. We could do it as (a) a true bitmask (which would mean a single integer input that doesn't include just 0s and 1s), or (b) more of an meta bitmask where the parameter input is only 0s and 1s. (a) is probably easier to machine read but more complex to manually test, while (b) is vice versa.

@sheldonabrown @kurtraschke Any thoughts on this?

On a related note - I just realized that there is an error in the "References" documentation. It says:

Right now, only a few types of objects will ever appear in the references section: agencies, routes, stops, trips, and situations...They will always appear in that order, since stops and trips reference routes and routes reference agencies. If you are processing the result stream in order, you should always be able to assume that an referenced entity would already have been included in the references section.

If you look at the JSON arrivals-and-departures-for-stop response above, its not in this order - situations is not the last element (instead, it looks like alphabetical order). In the XML response, it looks like situations element isn't included if no situations are active on the stop. In a stop that does have situations, JSON order is the same (i.e., does not match doc), but the XML order does follow the docs, with situations being the last element.

The easiest solution is just to remove that portion of the docs that define an exact order. Or, we can try to force Jackson to output the JSON in a particular order. I recall another project (OTP, I think) saying it was difficult to get the JSON order to always match the XML order.

kurtraschke commented 9 years ago

As to the relevance of objects returned by the references mechanism, I believe it makes more sense if you look at it through the lens of graph traversal. In this case, for example, the response from stops-for-route refers to stops, so the relevant stop objects are included. But those stops refer to routes, so then the routes are included. In turn, the routes refer to agencies, so the agencies are included, and this is where the traversal stops.

The general intention behind the references mechanism was (is) to preload a mobile application's object cache with every object the application might reasonably need in response to user navigation from the results returned in a given response (for example, the user, while browsing stops on a particular route navigates to a certain stop, then selects a different route at one of those stops, then selects the agency operating that route for further details...).

Rather than filtering the returned references by object type, I would suggest filtering by depth, which I believe could be done simply by passing a counter around the various methods of BeanFactoryV2 and incrementing it on function calls. For example, tracing the request mentioned earlier, we begin at BeanFactoryV2.getStopsForRoute() (imagine depth=0). getStopsForRoute() then calls addToReferences() with the route that was queried (depth=1), which in turn calls getRoute(), which in turn calls addToReferences() with the route's agency (depth=2). Returning to getStopsForRoute(), addToReferences() is again called for each stop (depth=1); in each case it calls getStop(), which in turn calls addToReferences() for each route calling at the stop (depth=2), which in turn calls getRoute(), which, as previously mentioned, calls addToReferences() with the route's agency (depth=3) .

From an API user's perspective, I believe this results in a more consistent experience--if you want all of the objects directly referenced in the response, ask for depth=1. If you want those objects and the objects to which they refer, ask for depth=2, and so on. If you want all referenced objects, omit the parameter?

drabell commented 9 years ago

@barbeau

Many thanks for your kind attention and detailed comments.

Response document Structure and Granularity levels. Pertinent to the key topic, it's highly desirable from developer's perspective to have a parametric 'stops-for-route' web query, which allows to produce a response data set with more customization options that currently exist in regards to the selection of the Elements (essentially, document sections) and the granularity level of each Element (i.e. details attributes). For the purpose of definitive clarity I suggest to adhere to the following semantics pertinent to web query/response customization: "Reference" (or document Sections) corresponding to XML high level elements and "Details" (semantically close to "granularity", "depth") in reference to the data items (XML attributes) included in particular sections. Therefore, there should be potentially 2 web query params:

References parameter based on bitmask as suggested; probably allocate entire 1 byte with reserved bits for potential future needs: Bit 7 (MSB): Reserved Bit 6: Reserved Bit 5: situations (i.e., alerts) Bit 4: trips Bit 3: all routes served by relevant agencies Bit 2: agencies serving requested route Bit 1: route (as requested) Bit 0 (LSB) : stops Example: parameter References=0000001 will return just stops; References=0000011will return stops and routes

Details parameter could be just 0 or 1 pertinent to the References listed above, where 0-means no details (just entities list) and 1 showing details for respective References. It could be reasonable to consider more granularity in specifying the stops details list (probably to allocate another byte instead of bit). For example, stops data may include many additional attributes like: <wifi> (yes or nor, i.e. 0/1), meteorology data (<temperature>, <wind> <precipitation>), even light level (<illuminance>) at stop, etc.

On a separate note: the absence of globally (i.e. nationwide) unique Route ID is a certain deficiency of a system, which may cause a future mess. Currently a Bus headsign de facto serves as route_id. To get the globally unique Route index additional field agency-id is required. This could cause potentially ambiguous situation in case of two different agencies serving the same area and having identical route_id. But, this issue is just tangentially related to the key topic and it probably would be better to discuss it in a separate post.

Thanks and regards, Alexander

laidig commented 9 years ago

@kurtraschke, @drabell , @barbeau many thanks for the discussion.

I think that the bitmask is a bit much to parse and overcomplicates the solution. For another view, I turn to the SIRI discovery spec, which includes what I think is a more elegant way of specifying multiple levels of detail.

Many SIRI requests have a specified detail level. The analogous call to the OBA stops-for-route in SIRI is StopPointsRequest, which takes in a route ('line' in SIRI nomenclature) and detail level. The detail levels are for this particular calit does make the l are:

minimum: Return only the name and identifier of the stop. normal: Return name, dientifier and coordinates of the stop. full: Return all available data for each stop.

While this doesn't split hairs as finely as the bitmask, or come out of a response to the transit graph's peculiarities, I think something along these lines does balance the complexity and usefulness of the API.

drabell commented 9 years ago

I would completely agree with suggestion by @laidig , which so far provides a delicate balance between universality and practicality, i.e. complexity and usefulness. It would be rather helpful to extend the functionality of stops-for-route OBA web query w/aforementioned SIRI feature, i.e. implementing the request parameter stopDetails with 3 options:

stopDetails=min (only stop_name and stop_id) stopDetails=normal (stop_name, stop_id, lat, lon and direction (or bearing)) stopDetails=max (all available data items for stops)

Similar approach to the References would be equally useful: refDetails=min (requested route_id, short_name and respective agency) refDetails=normal (requested route and respective agency full data set) refDetails=max (all available References with max details in regards to the requested route)

Many thanks to @laidig, @barbeau and @kurtraschke for rather thoughtful comments and fruitful discussion. My best, Alex

barbeau commented 9 years ago

+1 for mirroring SIRI

@laidig any reason why MTA is using OBA REST API instead of SIRI for discovery? I didn't think SIRI extended to discovery-like functions. Maybe we should just table this in favor of a real SIRI StopPointsRequest API method.

laidig commented 9 years ago

@barbeau When we launched, SIRI 2.0 Discovery was not finalized. It was finalized last year.

There's a reason why this answer was at my fingertips ;)

barbeau commented 9 years ago

@laidig Ok, thanks, that makes sense. :)

drabell commented 9 years ago

Gentlemen, Would you please update on the status of this issue, namely: has it been resolved, and if so - what is the format of the respective data feed/web query and a response XML structure? Thanks and regards, Alexander

drabell commented 9 years ago

Gentlemen, Many thanks for this valuable discussion. Recently I have released a production pilot of NY Bus Map web app (re: http://infosoft.biz/bus.aspx: for mobile: http://infosoft.biz/mbus.aspx ) and now working on real-time API optimization. In this regards:

  1. Would you please advise me on the status of this issue, namely: is it possible to get the stops-for-route data feed referencing just specified Bus Route?
  2. On a separate note: it seems that many bus time table .pdf docs specified in the route reference section have been moved (e.g. Q60 - got 404 error). How to find these docs' new location (url) ? Thanks and regards, Alex Bell