MobilityData / mobility-database-catalogs

The Catalogs of Sources of the Mobility Database.
Apache License 2.0
260 stars 56 forks source link

Is identifying errors in GTFS data within the scope of this project? #157

Closed bgo-eiu closed 2 years ago

bgo-eiu commented 2 years ago

Sorry if this is the wrong place to ask, but I was wondering - if an agency provides GTFS data that has known errors in it, would the approach be to host a manually corrected version of it, or is the intent to mirror the agency's data with errors included if they are there. I am curious because some consumers of GTFS data tend to assume that transit agencies will fix GTFS errors if they're pointed out, but that assumption doesn't always work out.

Edit - To clarify, by erroneous data what I had in mind was things like incorrect stop names, missing stops, stops that don't exist etc. I supposed data that is invalid because it doesn't conform to the schema would be a different problem.

emmambd commented 2 years ago

Thanks for your question @bgo-eiu ! Asking in an issue is perfect.

In the short term, our intention is to integrate the canonical GTFS validator into the catalogs and share the validation report for each source, as well as the number of errors and warnings. (If there is other pertinent info you'd be interested in, let us know!) The goal here would be to better assess the quality of GTFS data across the community and encourage corrections.

Whether we'd ever host manually corrected dataset versions is a longer term question, but we'd be curious about feedback from both transit providers and data consumers to assess the value and challenges around governance.

bgo-eiu commented 2 years ago

Thanks for the answer, that makes sense - I did try and validate my local agency's data and for the most part it's valid on the terms a GTFS validator would check, with the way they specify school-only stops being the exception.

I couldn't tell you why transit agencies don't always fix this even when requested/pointed out, but every so often I'll "discover" a stop that exists and is in use on a route but isn't in the GTFS data, or one that's been named wrong in the GTFS data, or a stop that's in the GTFS data but doesn't exist in real life. The only way to really figure out these instances are either surveying on the ground or working it out based on context (like for one I found recently, I realized there had to be a stop at that location based on the route configuration at nearby stops, the turn restrictions, and the stop design guidelines). It's something I'm working on documenting in any case, so I could contribute that if it ever comes up down the line in the long term. I haven't worked out what I'm going to do with the information yet, but whenever I went to a consumer of GTFS to request a correction, the response was to ask the transit agency to correct it. Then when the transit agency just doesn't, it's a dead end.

emmambd commented 2 years ago

@bgo-eiu Thank you for sharing! I'm sure @isabelle-dr would be interested in getting your thoughts on the validator side of things if there are ways we can be making these kinds of non-obvious problems easier for agencies to identify and fix.

I'll reach out to you again when we're starting to think about the min specs for integrating the validator with the database if you're interested.

bgo-eiu commented 2 years ago

I am interested!

emmambd commented 2 years ago

@bgo-eiu Great! Please send me an email at emma@mobilitydata.org or join the MobilityData Slack so I can contact you.

isabelle-dr commented 2 years ago

Really interesting insight @bgo-eiu ! You might also want to have a look at the Grading Scheme, which could capture the type of problems you're describing here. But then the question: "okay, now where do I put this information?" is not answered in a standard & systemic way yet. It is in the roadmap and you can also share your insights on the card dedicated to this issue here.