codeforpdx / opentransit-metrics

Prototype of public transit data visualization system
https://opentransit-pdx.herokuapp.com/
MIT License
7 stars 8 forks source link

Storing and referencing historical GTFS feed #8

Open sidetrackedmind opened 2 years ago

sidetrackedmind commented 2 years ago

Current setup

The save_routes.py script has a pretty good description written into the comments (thanks whoever wrote this!).

The script downloads and parses the GTFS specification and saves the configuration for all routes to S3. The S3 object contains data merged from GTFS and the Nextbus API (for agencies using Nextbus). The frontend can then request this S3 URL directly without hitting the Python backend.

For each direction, the JSON object contains a coords array defining the shape of the route, where the values are objects containing lat/lon properties:

"coords":[
 {"lat":37.80707,"lon":-122.41727}
 {"lat":37.80727,"lon":-122.41562},
 {"lat":37.80748,"lon":-122.41398},
 {"lat":37.80768,"lon":-122.41234},
 ...
]

For each direction, the JSON object also contains a stop_geometry object where the keys are stop IDs and the values are objects with a distance property (cumulative distance in meters to that stop along the GTFS # shape), and an after_index property (index into the coords array of the last coordinate before that stop).

"stop_geometry":{
   "5184":{"distance":8,"after_index":0},
   "3092":{"distance":279,"after_index":1},
   "3095":{"distance":573,"after_index":3},
   "4502":{"distance":1045,"after_index":8},
   ...
}

The terminal output of save_routes.py looks something like this:

route 98 MAX Shuttle
 default direction = 0
  shape_id: 503530 (220x) stops:5 from 8196 Gateway Transit Center to 13504 Portland International Airport - Arrivals 8196,10856,13206,13208,13504
  most common shape = 503530 (220 times)
  title = To Portland International Airport - Arrivals
  distance = 15649
 default direction = 1
  shape_id: 503532 (220x) stops:5 from 13504 Portland International Airport - Arrivals to 8196 Gateway Transit Center 13504,13207,13206,10856,8196
  most common shape = 503532 (220 times)
  title = To Gateway Transit Center
  distance = 15126

issue

Currently the script just overwrites the one S3 path, but this process could be extended in the future to store different paths for different dates, to allow fetching historical data for route configurations.

Additional information

If you're not familiar with GTFS, it would be good to look at the reference material here - https://developers.google.com/transit/gtfs/ as a starting point.

sidetrackedmind commented 2 years ago

From SF repo - I think this is a similar issue:

Currently we only store the latest version of route-configs, but have versioning for timetables.

save_routes.py runs daily and updates these using the recentmost GTFS that is available.

Using TransitFeeds (https://transitfeeds.com/api/swagger/) we can run save_routes.py on older versions of the GTFS based on the start/end dates of each GTFS file. Then we'll be able to show scheduled times and the proper routes for dates in the past.

For the most part, the agency config can be changed while remaining compatible with previous versions of the agency's network. However, for cases like where a branch is completely changed or the routes are redesigned, a different version would be required here too (see https://github.com/trynmaps/metrics-mvp/issues/127), which can be done in another issue and PR.

sidetrackedmind commented 2 years ago

Another comment from the SF repo (below). We'll have the same problem if/when Trimet changes routes/stops

Latest GTFS route configs are not in sync with historic data

For example, between May 24 and June 11, the F outbound direction id changed from F__O_E00 to F__O_F00, with a new terminal ("Castro" instead of "17th and Noe") and new last stop id (33311). So attempts to show anything about F outbound trips will not find any data . Even if we add logic to fall back on older direction ids, we still have the problem of the stop list being out of sync also.

sidetrackedmind commented 2 years ago

cross link with issue #39