Traewelling / traewelling

Free check-in service to log your public transit journeys
https://traewelling.de
GNU Affero General Public License v3.0
225 stars 45 forks source link

Support for multiple data sources #1635

Closed MrKrisKrisu closed 9 months ago

MrKrisKrisu commented 1 year ago

Is your feature request related to a problem? Please describe. Currently, Traewelling relies solely on DB-Rest asthe only data source, which is a wrapper for the HAFAS of Deutsche Bahn. This limits our data availability to Germany and a few major connections in neighboring countries. Trams, buses, and other public transportation modes in foreign countries are often not included. To provide a more comprehensive service, we need to expand our data sources to include information from multiple providers.

Describe the solution you'd like We aim to enhance Träwelling by integrating multiple data sources to gather public transportation data. This would allow users to check in for rides across borders. We are seeking data sources that cover a wide range of locations, both within and outside of Germany, to provide users with a more extensive coverage.

Describe alternatives you've considered /

Additional context /

Expanding our data sources would greatly improve the usability and inclusivity of our service, allowing users to benefit from a wider range of public transportation options both domestically and internationally. Any suggestions or recommendations for reliable data sources with comprehensive coverage would be greatly appreciated.

Please feel free to contribute any relevant information, ideas, or suggestions for potential data sources to this issue. Thank you for your support!

Data Source Area Link / Context
DB Rest / HAFAS DB Germany + some foreign trips https://github.com/derhuerst/db-rest
HAFAS for other areas ... https://gist.github.com/derhuerst/2b7ed83bfa5f115125a5 (Thanks @derhuerst)
EFA for some german areas https://www.kvv.de/fahrplan/fahrplaene/open-data.html, https://www.vbn.de/service/entwicklerinfos/opendata-und-openservice, https://www.connect-fahrplanauskunft.de/index.php?id=opendata
SBB Switzerland https://data.sbb.ch/explore/?sort=modified&refine.keyword=Verkehr
ÖBB Austria https://data.oebb.at/#default/home
derhuerst commented 1 year ago

The transport-apis project has many transit APIs listed; It intends to be the "source of truth" for basic information about these APIs (their endpoints, authentication mechanisms, licensing scheme, etc.), so that projects don't need to keep track of these changes each individually. If there is anything missing over there, please create an Issue or submit a PR!

derhuerst commented 1 year ago

Regarding the actual idea being discussed here: I think that many tricky technical and UX questions arise once starts having >1 underlying data source:

I have brainstormed more about some technical aspects topic in Why linked open transit data?, stable-public-transport-ids, and experimented with fusing >1 (HAFAS-like) data source in pan-european-public-transport.

TLDR: Adding another data source is technically feasable, but how do we create a usable UX from that?

HerrLevin commented 1 year ago

I'm currently working on a really hacky POC to inject GTFS data into the DB-Rest response so that we might be able to combine multiple data sources without having to drastically change our internal project's structure. The repo will be made publically available around the start of the GPN next week.

Currently, it's forwarding the departure request directly to db-rest v5 while simultaneously searching for departures on that IBNR. The departures provided via GTFS are then injected into the JSON. To determine what endpoint to call when we're getting a journey request, I simply took inspiration from the current HAFAS-Trip-IDs and added a "GTFS|{gtfs-id}" prefix to the trip IDs. This might be extended to combine multiple APIs from multiple (overlapping) data sources, but the first step might be, to add ÖBB, SNCF, SBB, etc., and restrict them to regular public transport like busses and trams, which are not covered by DB's HAFAS system.

I might have a few ideas to combat your above-mentioned problems:

This is all in its infancy at the moment but already describes the rough direction I'd like to go.


P.S.: speaking of GPN - will we see you there? 👀

vainamov commented 1 year ago

It's unfortunately limited to trains within Finland, but the Fintraffic API is awesome: https://www.digitraffic.fi/en/railway-traffic/

derhuerst commented 1 year ago

I'm currently working on a really hacky POC to inject GTFS data into the DB-Rest response so that we might be able to combine multiple data sources without having to drastically change our internal project's structure. The repo will be made publically available around the start of the GPN next week.

Currently, it's forwarding the departure request directly to db-rest v5 while simultaneously searching for departures on that IBNR. The departures provided via GTFS are then injected into the JSON. To determine what endpoint to call when we're getting a journey request, I simply took inspiration from the current HAFAS-Trip-IDs and added a "GTFS|{gtfs-id}" prefix to the trip IDs. This might be extended to combine multiple APIs from multiple (overlapping) data sources, but the first step might be, to add ÖBB, SNCF, SBB, etc., and restrict them to regular public transport like busses and trams, which are not covered by DB's HAFAS system.

This is very similar to what I've been doing with match-gtfs-rt-to-gtfs: It tries to match data from a HAFAS API (e.g. the DB one) to a GTFS dataset by matching their stop/trip/route IDs/names/locations.

Over time, I've invested quite a lot of effort to make the matching logic fast and flexible enough. For example, it can match a HAFAS stop with a GTFS stop even when they don't share an ID (IBNR), have slightly different names, and slightly different geolocations.

Unfortunately, the code has many indirections and isn't well-documented. Also, it's been a while since I've tested it with the DB HAFAS endpoint. But if you're interested, take a look!

do we form a new "proprietary" ID that "masks" the underlying DB/SNCF IDs? This will be done w/ a proprietary combination of some proprietary prefixes and the API's original ID.

You might also want to look into Multiformats as a generalized and future-proof mechanism for "combining IDs".

[…] how do we make sure the UX is not confusing. […] How do we make sure users can find the train/trip they're looking for if they're used to a very specific naming scheme (e.g. "RE 1" vs "RE 73793", "TGV INOUI 123" vs "TGV 123")? We need to keep track of which APIs should be used for which station. […] A general primary identifier could be IFOPT as the parent station with the APIs internal station ID and a reference to the station as children. […]

The Trainline stations database might be very helpful with this.