Closed MrKrisKrisu closed 9 months ago
The transport-apis project has many transit APIs listed; It intends to be the "source of truth" for basic information about these APIs (their endpoints, authentication mechanisms, licensing scheme, etc.), so that projects don't need to keep track of these changes each individually. If there is anything missing over there, please create an Issue or submit a PR!
Regarding the actual idea being discussed here: I think that many tricky technical and UX questions arise once starts having >1 underlying data source:
I have brainstormed more about some technical aspects topic in Why linked open transit data?, stable-public-transport-ids, and experimented with fusing >1 (HAFAS-like) data source in pan-european-public-transport.
TLDR: Adding another data source is technically feasable, but how do we create a usable UX from that?
I'm currently working on a really hacky POC to inject GTFS data into the DB-Rest response so that we might be able to combine multiple data sources without having to drastically change our internal project's structure. The repo will be made publically available around the start of the GPN next week.
Currently, it's forwarding the departure request directly to db-rest v5 while simultaneously searching for departures on that IBNR. The departures provided via GTFS are then injected into the JSON. To determine what endpoint to call when we're getting a journey request, I simply took inspiration from the current HAFAS-Trip-IDs and added a "GTFS|{gtfs-id}" prefix to the trip IDs. This might be extended to combine multiple APIs from multiple (overlapping) data sources, but the first step might be, to add ÖBB, SNCF, SBB, etc., and restrict them to regular public transport like busses and trams, which are not covered by DB's HAFAS system.
I might have a few ideas to combat your above-mentioned problems:
In our case: (mostly) yes. We want to use the "official" data endpoint for one vehicle, e.g. Karlsruhe public transport uses their open data endpoint, ICEs use DB Hafas, TGVs use SNCF's and so on (This adds one bigger question: What do we do with trains crossing borders? Is the TGV-Data provided by SNCF more or less accurate than the DB's? Just guessing by the DB's polylines, everything outside of state lines is "bad data")
This will be done w/ a proprietary combination of some proprietary prefixes and the API's original ID.
This is the biggest question in my opinion b/c it just opens even more questions. My current ideas are the following:
We need to keep track of which APIs should be used for which station. This could be done by using a modified version of the GTFS stops table. A general primary identifier could be IFOPT as the parent station with the APIs internal station ID and a reference to the station as children. Maybe even additional information such as "only long-distance trains" could be added.
In my opinion, the "correct way" of displaying the line name, etc. is using what the "correct" API is providing. However, this could be extended by providing additional information in some sort of translation schema since it will indeed be confusing to end users in some situations. I'm not completely happy with this approach but it's the best I came up with until now.
This is all in its infancy at the moment but already describes the rough direction I'd like to go.
P.S.: speaking of GPN - will we see you there? 👀
It's unfortunately limited to trains within Finland, but the Fintraffic API is awesome: https://www.digitraffic.fi/en/railway-traffic/
I'm currently working on a really hacky POC to inject GTFS data into the DB-Rest response so that we might be able to combine multiple data sources without having to drastically change our internal project's structure. The repo will be made publically available around the start of the GPN next week.
Currently, it's forwarding the departure request directly to db-rest v5 while simultaneously searching for departures on that IBNR. The departures provided via GTFS are then injected into the JSON. To determine what endpoint to call when we're getting a journey request, I simply took inspiration from the current HAFAS-Trip-IDs and added a "GTFS|{gtfs-id}" prefix to the trip IDs. This might be extended to combine multiple APIs from multiple (overlapping) data sources, but the first step might be, to add ÖBB, SNCF, SBB, etc., and restrict them to regular public transport like busses and trams, which are not covered by DB's HAFAS system.
This is very similar to what I've been doing with match-gtfs-rt-to-gtfs: It tries to match data from a HAFAS API (e.g. the DB one) to a GTFS dataset by matching their stop/trip/route IDs/names/locations.
Over time, I've invested quite a lot of effort to make the matching logic fast and flexible enough. For example, it can match a HAFAS stop with a GTFS stop even when they don't share an ID (IBNR), have slightly different names, and slightly different geolocations.
Unfortunately, the code has many indirections and isn't well-documented. Also, it's been a while since I've tested it with the DB HAFAS endpoint. But if you're interested, take a look!
do we form a new "proprietary" ID that "masks" the underlying DB/SNCF IDs? This will be done w/ a proprietary combination of some proprietary prefixes and the API's original ID.
You might also want to look into Multiformats as a generalized and future-proof mechanism for "combining IDs".
[…] how do we make sure the UX is not confusing. […] How do we make sure users can find the train/trip they're looking for if they're used to a very specific naming scheme (e.g. "RE 1" vs "RE 73793", "TGV INOUI 123" vs "TGV 123")? We need to keep track of which APIs should be used for which station. […] A general primary identifier could be IFOPT as the parent station with the APIs internal station ID and a reference to the station as children. […]
The Trainline stations database might be very helpful with this.
Is your feature request related to a problem? Please describe. Currently, Traewelling relies solely on DB-Rest asthe only data source, which is a wrapper for the HAFAS of Deutsche Bahn. This limits our data availability to Germany and a few major connections in neighboring countries. Trams, buses, and other public transportation modes in foreign countries are often not included. To provide a more comprehensive service, we need to expand our data sources to include information from multiple providers.
Describe the solution you'd like We aim to enhance Träwelling by integrating multiple data sources to gather public transportation data. This would allow users to check in for rides across borders. We are seeking data sources that cover a wide range of locations, both within and outside of Germany, to provide users with a more extensive coverage.
Describe alternatives you've considered /
Additional context /
Expanding our data sources would greatly improve the usability and inclusivity of our service, allowing users to benefit from a wider range of public transportation options both domestically and internationally. Any suggestions or recommendations for reliable data sources with comprehensive coverage would be greatly appreciated.
Please feel free to contribute any relevant information, ideas, or suggestions for potential data sources to this issue. Thank you for your support!