BlinkTagInc / node-gtfs

Import GTFS transit data into SQLite and query routes, stops, times, fares and more.
MIT License
432 stars 149 forks source link

Multiple Agencies #40

Closed dfairaizl closed 8 years ago

dfairaizl commented 8 years ago

Hey there,

I was attempting to use this npm module to import the transit data for the various lines here in NYC, everything was working great until I came upon the Metro-North lines which have multiple agencies, leading unfortunately to them all being imported with the same agency_key.

There are a few options to solve this but I wanted to get your opinion. I'm happy to submit a PR but am not sure the best way to support multiple agencies since agency_key is used so heavily throughout the queries.

brendannee commented 8 years ago

Good catch. Most GTFS files have only one agency so this hasn't come up until now.

Currently, in node-gtfs you get to specify an agency_key per file. One solution would be to use this plus the agency_id from agencies.txt, so maybe you end up with something like metronorth_2 and metronorth_3 for agency_keys.

I'd be happy to take a pull request, or else I can get around to making an update to support this- let me know if this is something you'd like to work on.

dfairaizl commented 8 years ago

Hey @brendannee,

Great, I'd be happy to take a stab at it. I'll start looking at making the change tonight!

matthiaskern commented 8 years ago

Any updates on this? Stumbled upon this problem too and would like to get it to work.

leanne63 commented 8 years ago

It seems agency_id should be added to the applicable functions. Agency_key is an arbitrary value, while agency_id is a valid value within the Agency model and used to relate Agency to Routes.

However, agency_key is still needed to differentiate transit "sets" (as represented by individual GTFS zip files).

For example: getAgency: function(agency_key, agency_id, cb) {

  /*
   * Returns an agency
   */
  getAgency: function(agency_key, agency_id, cb) {
    Agency.findOne({
      agency_key: agency_key,
      agency_id: agency_id
    }, cb);
  },

and: getRoutesByAgency: function(agency_key, agency_id, cb) {

  /*
   * Returns an array of routes for the `~~agency_key~~agency_id` specified
   */
  getRoutesByAgency: function(agency_key, agency_id, cb) {
    Route.find({
      agency_key: agency_key,
      agency_id: agency_id
    }, cb);
  },

Only functions related to the actual Agency would need to be changed, so they represent the true Agency rather than the arbitrary key. Other than those above, I see these two: getFeedInfo getTimetablesByAgency

Of course, agency_id would be empty string in most cases... so, you might want to do something like getAgenciesByDistance():

  /*
   * Returns an agency
   */
  getAgency: function(agency_key, agency_id, cb) {
    if(_.isFunction(agency_id)) {
      cb = agency_id;
      agency_id = ''; // default is empty string
    }

    Agency.findOne({
      agency_key: agency_key,
      agency_id: agency_id
    }, cb);
  },
dfairaizl commented 8 years ago

@brendannee Can you provide some context into why agency_key exists rather than just using the agency id provided by the feed? I like @leanne63 solution to this problem.

leanne63 commented 8 years ago

@dfairaizl The agency_id field, per the GTFS specification, only exists in a feed's agency.txt and routes.txt files.

The agency_id also may not be unique between multiple feeds (illustrated by the fact that most feeds' agency_id values are empty).

The agency_key associated with the feed via node-gtfs' config.js file provides a simple mechanism to maintain a separation between data for multiple feeds.

dfairaizl commented 8 years ago

maintain a separation between data for multiple feeds

Ahhh yes that makes perfect sense now. @leanne63 Did you already implement this functionality?

leanne63 commented 8 years ago

I had begun working on it, @dfairaizl, when you started this thread. I was waiting to see if @brendannee had a preference, based on your question, as to how to work it. Otherwise, I can have a PR ready by tomorrow.

dfairaizl commented 8 years ago

If there is no preference @leanne63 you can go ahead and submit your PR for this bug.

leanne63 commented 8 years ago

I'm running the tests provided with gtfs-node. They're failing because the test agency has only one agency, but that agency DOES have an agency_id.

Noting @brendannee quote:

Most GTFS files have only one agency so this hasn't come up until now.

which makes sense, as files with a single agency should have an empty string in the agency_id field.

So, question: are we safe assuming a default empty string for the agency_id? (if so, we probably want to change the test data for that assumption.)

Or, shall I create new functions that require an agency_id to be passed?

leanne63 commented 8 years ago

Also, just for reference, these are the functions I'm toying with: getAgency - modify to accept agency_id or default to empty string getAgenciesByKey - add this function to allow search by key, regardless of ID getFeedInfo - modify to accept agency_id or default to empty string getFeedInfoByKey - add this function to allow search by key, regardless of ID getRoutesByAgency - modify to accept agency_id or default to empty string or add for specificity: getRoutesByAgencyId getRoutesByAgencyKey

Technically, anything that says "agency" should be referencing an agency_id along with an agency_key.

Anything regarding the agency_key is really related to the "feed" (the set of files in a given grouping of GTFS zip).

For example, TransitFeeds' Metro North Railroad http://transitfeeds.com/p/mta/87 feed/agency_key equivalent is "mta/87"

where the GTFS Data Exchange's http://www.gtfs-data-exchange.com/agency/metro-north-railroad equivalent is 'metro-north-railroad'.

(Someone was trying to get a standard feed_id moving forward for GTFS providers a couple of years ago. As far as I can tell, it has not been officially implemented.)

brendannee commented 8 years ago

I accepted and merged the a pull request that should solve this issue.

Check it out and let me know if there is anything else that should be updated.

https://github.com/brendannee/node-gtfs/releases/tag/0.4.0

Pull requests welcome!