Closed polettif closed 3 years ago
I'll check out the main issue right now, but not in the meantime in regard to "wierd transfers" #61, which reflects some huge re-engineering of new gtfs_traveltimes()
functions. That also returns trips with fewer transfers than gtfs_routes()
, and I would trust that new algorithm over the present gtfs_routes()
. The whole package will be updated once gtfs_traveltimes()
has reached a stable state.
That commit implements a grep_fixed
parameter, giving the following results, where the warnings at the top of each are the key.
library (gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.150'
gtfs = extract_gtfs("routing.zip") # downloaded from your URL above
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable(gtfs, date = 20200514)
gtfs_route(gtfs, "Zürich HB", "Bern", start_time = 11*3600, grep_fixed = TRUE)
#> Warning in station_name_to_ids(i, gtfs, from_to_are_ids, grep_fixed): The name [Bern] matches multiple stops spread up to 333.7km apart.
#> Considering refining matching via `grep` with `grep_fixed = FALSE`.
#> route_name trip_name stop_name arrival_time
#> 1 14 Zürich, Seebach Zürich, Bahnhofquai/HB 10:59:00
#> 2 14 Zürich, Seebach Zürich, Stampfenbachplatz 11:01:00
#> 3 14 Zürich, Seebach Zürich, Beckenhof 11:02:00
#> 4 14 Zürich, Seebach Zürich, Kronenstrasse 11:03:00
#> 5 14 Zürich, Seebach Zürich, Schaffhauserplatz 11:05:00
#> 6 14 Zürich, Seebach Zürich, Guggachstrasse 11:06:00
#> 7 10 Zürich Flughafen, Fracht Zürich, Hirschwiesenstrasse 11:08:00
#> 8 14 Zürich, Seebach Zürich, Milchbuck 11:07:00
#> 9 10 Zürich Flughafen, Fracht Zürich, Berninaplatz 11:09:00
#> departure_time
#> 1 11:00:00
#> 2 11:01:00
#> 3 11:02:00
#> 4 11:03:00
#> 5 11:05:00
#> 6 11:06:00
#> 7 11:08:00
#> 8 11:08:00
#> 9 11:09:00
gtfs_route(gtfs, "Zürich HB", "^Bern", start_time = 11*3600, grep_fixed = FALSE)
#> Warning in station_name_to_ids(i, gtfs, from_to_are_ids, grep_fixed): The name [^Bern] matches multiple stops spread up to 306.1km apart.
#> Considering refining matching via `grep` with `grep_fixed = FALSE`.
#> route_name trip_name stop_name arrival_time departure_time
#> 1 8 Brig Zürich HB 10:55:00 11:02:00
#> 2 8 Brig Bern 11:58:00 12:06:00
gtfs_route(gtfs, "Zürich HB", "^Bern$", start_time = 11*3600, grep_fixed = FALSE)
#> route_name trip_name stop_name arrival_time departure_time
#> 1 8 Brig Zürich HB 10:55:00 11:02:00
#> 2 8 Brig Bern 11:58:00 12:06:00
Created on 2020-12-14 by the reprex package (v0.3.0)
Let me know if you think there's a better way to do this, and please re-open with suggestions if so. Also, do you think it would be worth adding an auxilliary function to check station name matching? gtfs_station_name()
, which would return a filtered version of gtfs$stops
containing matching rows, to enable easy checking of grep
patterns when using grep_fixed = FALSE
.
Thank you for the inspiration for these improvements!
Thank you for the inspiration for these improvements!
No worries, happy to help and discuss this :)
IMO with R being used for analysis in scripts I'd like to reduce "black box input handling" as much as possible. If an input does not work, throw a meaningful error. A function shouldn't try to fix wrong inputs and, more importantly, users should not have to anticipate a function's error-fixing (like "^Bern$", depending on the default). It should be up to the user to decide how to change the input. That's why tidytransit's raptor only uses stop_ids and the slightly higher level travel_times only accepts exact stop_names. That might be verbose and less convenient but I prefer it that way.
Now, with that philosophical preamble out of the way, I would handle stop names like this:
If 3 happens I'd say the best way is to use gtfs_station_name()
(or find_stop_name
or find_stop_id
) before running the routing functions. That way the user needs to decide which stop to use because "Bern" and "Zürich, Berninaplatz" are both equally likely to be the right stop. This has the added benefit (IMO) that analysis steps are in more separated functions. I guess my main point is: Finding the right stop based on a string is a separate task from routing, so it should be handled separately.
Running the following snippet gives me a route to Zürich, Berninaplatz instead of Bern. Does this work as intended? If
to
matches multiple stop names I'd expect it to use the exact match.On a side note, the resulting route does not seem correct, there are some weird transfers happening on Hirschwiesenstrasse/Milchbuck, the actual best route has no transfers.