Closed pietercolpaert closed 9 years ago
I've posted this call on the GTFS mailing list: https://groups.google.com/forum/#!topic/gtfs-changes/Z8Mf31MaZms
we used UUIDs for GTFS feed of Matera, of course personally i prefer URIs but UUIDs give a small step for people used to local numeric IDs
If their local ids are persistent we can easily create globally unique identifiers by prepending http://gtfs.org/{feed name}
I'm going to suggest to have persistent URIs per feed version now:
http://data.gtfs.org/{feed_name}/{feed_version}
becomes the base
Opened a suggestion on the GTFS mailing list: https://groups.google.com/forum/#!topic/gtfs-changes/ZPxhYMoNr0U
URIs with eternal semantics
It's not difficult to transform a GTFS CSV archive towards GTFS RDF once. It is however very difficult to do it twice:
How are we going to make sure that the identifiers we use to identify trips, routes, stops, stop_times, and so forth are going to have the same semantics after a second mapping? E.g., a stop with id 1 in the first dataset, may have an id 2 in a next version. When mapping the next version, the URI generated with id 1 will be overwritten by another stop.
We can imagine that for stops it may be solved by doing reconciliation on the dataset instead of relying on the "id" column. E.g., by using a combination of the name and the location to find the URI to be used when mapping that data. However for e.g., trips, routes stop_times, it's a more difficult story.
Suggestion to the GTFS community
I don't have a real solution for this problem in this mapper. It is a data problem with GTFS CSV: it is impossible to unlock this internal model into an open world where we need GUIDs which remain the same. To that extent, I would like to change the specification of GTFS itself to use GUIDs instead of local IDs within their CSV files. This however adds a big new responsibility to the data maintainer, yet doing this investment world-wide may be worth it.
So what needs to change? The base of the URI that we use in this mapper is http://gtfs.org/{something}/{feedname}/. This introduces a globally unique identifier for all local IDs in the GTFS file. Yet we now need to introduce persistence. This is only possible if we require the same stops to have the same IDs over and over again in different versions of the file, which is not a requirement at this moment.