CivicTechTO / ttc_subway_times

A scraper to grab and publish TTC subway arrival times.
GNU General Public License v3.0
40 stars 30 forks source link

Replicate and Improve Ntas API: GTFS-RT format #18

Open radumas opened 7 years ago

radumas commented 7 years ago

Having the scraper generate GTFS-RT data would have two benefits:

  1. Anyone creating apps that require GTFS-RT (a standard) would have access to a feed
  2. We could use and improve upon tools built on GTFS-RT data

This requires generating GTFS-RT in real... time, and then also reprocessing the archive of data.

patcon commented 7 years ago

@bowenwen totally knows my bias here, but I'd recommend writing the API spec in YAML as you wish it existed, using the Swagger Editor: http://editor.swagger.io/#/

Then you can version control that yaml, and rig up an actual API proxy later :)

radumas commented 6 years ago

So I actually think this might also solve #13 as well, since there are a few tools for archiving , analysing, and visualizing GTFS-Real Time data.

Here's the GTFS-RT spec, the first post will be updated with a list of tasks

radumas commented 6 years ago

There are a number of Java-based examples here

radumas commented 6 years ago
FeedMessage
+----FeedHeader
|     +----gtfs_realtime_version: '2.0'
|     +----incrementality: FULL_DATASET enum
|     +----timestamp: POSIX
+----FeedEntity
|      +----id (see EntitySelector)
|      +----TripUpdate
|      |       +----TripDescriptor
|      |       |      +---- trip_id (gtfs trip_id) if trip_id can be determined, the rest of the fields are 
|      |       |      +---- route_id
|      |       |      +----  direction_id
|      |       |      +----  start_time (ISO)
|      |       |      +----  start_date (YYYYMMDD)
|      |       |      +----  schedule_relationship: 
|      |       |              +-SCHEDULED if we can match to a trip_id
|      |       |              +-ADDED if we can't. If ADDED then there should be no trip_id, and the other fields are necessary
|      |       +---- VehicleDescriptor
|      |       |        +---- id
|      |       +---- StopTimeUpdate
|      |       |        +---- stop_sequence
|      |       |        +---- arrival StopTimeEvent
|      |       |        |       +---- time (POSIX) 
|      |       |        |       +---- uncertainty (omit if prediction, 0 if arrived)
|      |       |        +---- schedule_relationship: [SCHEDULED, SKIPPED, NO_DATA]
|      |       +---- timestamp (POSIX) vehicle timestamp
|      +---- Alert (could be used to merge data from TTC website, I'll get back to this)
|      |       +----
radumas commented 6 years ago

I forked the this project, recommended on the GTFS slack, it generates a GTFS-RT API from a database. I think our steady state solution could be:

  1. run the current scraper on Heroku and send the feed to
  2. Heroku Postgres instance.
  3. Then the GTFS-RT API would pull from that Heroku DB and publish to an API;
  4. Then we would use gtfsrdb to archive the GTFS-RT API to:
  5. A new, larger database for archiving and reporting. From there we could build metrics.
radumas commented 6 years ago

Got the package installed and built. Small problem is it's designed for surface vehicles, so default is taking in GPS positions of vehicles, not sure what is need to transform to trip updates (predicted arrival time for a headway service). Would need to modify at least two things:

  1. The SQL query
  2. The corresponding object definition
  3. Anything else processing the GPS coordinates?