CUTR-at-USF / transit-feed-quality-calculator

A tool that uses the gtfs-realtime-validator to calculate the quality of a large number of GTFS-realtime feeds
Other
7 stars 1 forks source link

Support additional feeds that require API keys #8

Closed barbeau closed 6 years ago

barbeau commented 6 years ago

Summary:

There are some feeds returned by TransitFeeds.com API that require API keys, and therefore the GTFS-realtime public URL endpoint isn't retrievable from the TransitFeeds.com API - feed.getUrls().getDownloadUrl() returns an empty string.

One example is TriMet in Portland, OR - here's their developer page: https://developer.trimet.org/GTFS.shtml

Data is at:

...but for the GTFS-rt feeds you need to append a &appID=0000000000000000000000000 where the 0s are your API key.

Luckily, for these feeds TransitFeeds.com does give you an info URL (feed.getUrls().getInfoUrl()) which looks to be the page where you can sign up for an API key. So we can use this info to start tracking down API keys for these known feeds (see bottom of this issue for a list I pulled from the API).

After we have API keys, we need a way to specify these feeds in a separate CSV file that we can use as input to this tool so the feeds that require API keys can be included in the analysis.

Here's a sample CSV format for feeds.csv:

feed_id, gtfs_url, gtfs_rt_url
"-1-Portland, OR, USA", https://developer.trimet.org/schedule/gtfs.zip, http://developer.trimet.org/ws/V1/TripUpdate&appID=0000000000000000000000000

feed_ids should have a negative number so they don't collide with IDs from TransitFeeds.com. The subfolder created for the feed should be named the feed_id (for example, -1-Portland, OR, USA). The GTFS feed should only be downloaded once, and should be renamed to gtfs.zip in the folder. So, if gtfs.zip already exists, don't download it again.

The code to download these feeds should be implemented in a separate class CsvFeedsDownloader so it can be used independently of the TransitFeedsDownloader (we may have other downloaders too, e.g., for Transitland - see https://github.com/CUTR-at-USF/transit-feed-quality-calculator/issues/4).

@Suryakandukoori Could you please take a look at this? I'm open to alternate formats for the CSV, or it could be JSON too I suppose. Just something simple we can hand-edit to add feeds ourselves.

Steps to reproduce:

  1. Run the tool

Expected behavior:

Be able to download and validate feeds that require API keys

Observed behavior:

Currently feeds with API keys have blank download URLs in the TransitFeeds.com, and therefore as simply skipped over when trying to download GTFS-realtime and GTFS feeds.

List of feeds from TransitFeeds.com we need API keys for, along with dev website URL

VIA Service Alerts - http://viaprimo.com/Opportunities/DevLicense.aspx
Metra Trip Updates - https://metrarail.com/developers
Metra Vehicle Positions - https://metrarail.com/developers
Metra Service Alerts - https://metrarail.com/developers
NJ Transit GTFS-realtime - http://www.njtransit.com/developers
Auckland Transport Trip Updates - https://api.at.govt.nz
Auckland Transport Vehicle Locations - https://api.at.govt.nz
AC Transit Vehicle Positions - http://api.actransit.org/transit/
AC Transit Service Alerts - http://api.actransit.org/transit/
AC Transit Trip Updates - http://api.actransit.org/transit/
Capital Metro Service Alerts - https://www.capmetro.org/metrolabs/
Capital Metro Vehicle Positions - https://www.capmetro.org/metrolabs/
Sound Transit Vehicle Positions - http://www.soundtransit.org/Developer-resources/Data-downloads
Sound Transit Trip Updates - http://www.soundtransit.org/Developer-resources/Data-downloads
Intercity Transit Vehicle Positions - http://www.soundtransit.org/Developer-resources/Data-downloads
Intercity Transit Trip Updates - http://www.soundtransit.org/Developer-resources/Data-downloads
Pierce Transit Vehicle Positions - http://www.soundtransit.org/Developer-resources/Data-downloads
Pierce Transit Trip Updates - http://www.soundtransit.org/Developer-resources/Data-downloads
KCM Vehicle Positions - http://www.soundtransit.org/Developer-resources/Data-downloads
KCM Trip Updates - http://www.soundtransit.org/Developer-resources/Data-downloads
Long Island Rail Road Trip Updates - http://datamine.mta.info
Metro-North Railroad Trip Updates - http://datamine.mta.info
NYC Subway Real-Time Estimates (Staten Island Railway) - http://datamine.mta.info/
NYC Subway Real-Time Estimates (L Train) - http://datamine.mta.info/
NYC Subway Real-Time Estimates - http://datamine.mta.info/
TriMet Trip Updates - http://developer.trimet.org/GTFS.shtml
TriMet Alerts - http://developer.trimet.org/GTFS.shtml
AMT Vehicle Positions - http://amt.qc.ca/developers/
AMT Alerts - http://amt.qc.ca/developers/
AMT Trip Updates - http://amt.qc.ca/developers/
TransLink SEQ Vehicle Positions & Trip Updates - https://gtfsrt.api.translink.com.au/

We may also need to include Denver RTD as well - here's the error message we're currently getting for those URLs:

Error reading GTFS-realtime feed 'http://www.rtd-denver.com/google_sync/VehiclePosition.pb' - java.io.IOException: Server returned HTTP response code: 401 for URL: http://www.rtd-denver.com/google_sync/VehiclePosition.pb
Error reading GTFS-realtime feed 'http://www.rtd-denver.com/google_sync/TripUpdate.pb' - java.io.IOException: Server returned HTTP response code: 401 for URL: http://www.rtd-denver.com/google_sync/TripUpdate.pb
barbeau commented 6 years ago

Good outline of CSV parsing methods at https://github.com/arnaudroger/SimpleFlatMapper/wiki/How-to-parse-a-csv-file-in-java.

I'd prefer to use Jackson as we're using it elsewhere and it provides a structured parsing method based on a POJO.

barbeau commented 6 years ago

Here's the info for Jackson CSV databinding with POJOs: https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv#data-binding-with-schema