codeforpdx / opentransit-metrics

Prototype of public transit data visualization system
https://opentransit-pdx.herokuapp.com/
MIT License
7 stars 8 forks source link

fetch vehicle position state directly from S3 without using opentransit-state-api; update documentation #6

Closed youngj closed 2 years ago

youngj commented 2 years ago

This PR updates the vehicle_positions module (formerly trynapi) to fetch vehicle state files directly from S3 without using opentransit-state-api. It uses the S3 API to loop over all keys in the time range in chunks of 1 hour. For each 1 hour chunk it builds lines of a CSV file for each route in memory and then appends the lines to a cache file on disk. The CSV cache files have essentially the same format as used previously, except with different headers to match the new vehicle property names stored by opentransit-collector.

This PR fetches state files sequentially. It seems likely that fetching multiple files from S3 in parallel (like opentransit-state-api) would improve performance.

The previous data stored under the "state" prefix by compute_new.py is moved to "metrics-state" to avoid conflict with the data stored by opentransit-collector.

The documentation is updated to remove references to trynapi, update orion references to opentransit-collector, and update various links.

This PR is based off of the youngj-rename-trynapi branch which renamed trynapi.py to vehicle_positions.py (https://github.com/codeforpdx/opentransit-metrics/commit/1cd39f1056fef4c9391ea942f29abddde5190f43) so that this PR shows the changed lines of vehicle_positions.py rather than showing it as a completely new file.