ODOT-PTS / GTFS-ride

GTFS-ride is an open standard for storing and sharing fixed-route transit ridership data.
https://gtfsride.org
Apache License 2.0
49 stars 10 forks source link

Privacy issues with rider_trip.txt, recommend removing #34

Open westontrillium opened 2 years ago

westontrillium commented 2 years ago

Problem statements

  1. In our opinion, the rider_trip.txt file goes against the Mobility Data Privacy Principles of which Trillium is an endorsing organization. Specifically, it violates principles # 1, # 5, # 6, and in its current state is at risk of violating principles # 3 and # 7.
  2. The use cases for this file are not apparent, and the ones I can think of do not justify the undo surveillance of riders’ travel patterns. In general, we would like to hear a case for this file’s inclusion that outweighs its privacy issues.
  3. How feasible is it to implement this feature of the spec? How would alights be recorded? How would information about a rider (e.g. rider_type) be generated?
  4. MDS has similar components that deal with the collection of rider trip data. These components have caused some very public controversy resulting in a blow to the spec’s reputation. There are valuable lessons to be learned from that history. For a discussion on rider trip data generated by GBFS and MDS and the surrounding privacy concerns, see this article.

Solutions considered

Looking forward to discussing further!

lrosenfield-uta commented 11 months ago

I do think that the use cases for the information that can be found in this dataset do not justify the privacy implications should such data be made publicly available. However, part of the benefit that I would hope to get out of GTFS-ride should it be adopted by my agency is the ability to use and develop tools to work with non-public datasets that can, because of the common format, be shared with other agencies. I am more hesitant to say rider_trip is of no use whatsoever for intra-agency purposes. If it were eliminated, I would hope to see some kind of standardized point-to-point trip propensity dataset.

A use case example: we are planning to split up a longer, regional route within our service area into two routes so that we can increase the frequency of the more heavily used northern portion of the route and split up our blocking to reduce operator travel time. The planned split point is at a commuter rail station about halfway along the route.

I mock up the new service and run it through a publicly available comparison tool. The comparison tool uses the EFC and Point-in-Time survey data to identify that a substantial number of riders I'd assumed were transferring from commuter rail were actually traveling through the planned terminus, getting off at a transfer point three stops later, transferring to a cross-town route, and all arriving at a single employment center.

Instead of simply deviating the crosstown route to the commuter rail station, I propose extending the southern route to terminate at the employment center, and reduce the frequency of the crosstown route.

The way I see it there are two ways the information in this scenario would be usable:

I'd love to hear your thoughts.