Closed polettif closed 4 years ago
Thanks @polettif - i can confirm reproducibility of that system meltdown. Shall fix asap
Thanks @polettif - that just happened because the only services listed in the trips
table are for service_id = "WEEK"
, but that date filters to service_id == "EMPT"
, giving an empty trip table. This was not anticipated, but was passed through to C++ code which expected some kind of non-NULL object, and so crashed.
library (gtfsrouter)
if (!file.exists ("routing.zip"))
download.file("https://github.com/polettif/gtfs-test-feeds/raw/master/zip/routing.zip",
"routing.zip")
gtfs = extract_gtfs("routing.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs$calendar_dates
#> service_id date exception_type
#> 1: WEEK 20181006 2
#> 2: WEEK 20181007 2
#> 3: EXPR 20181005 1
#> 4: EMPT 20181002 1
#> 5: EMPT 20181001 2 ### <=== This is the date entered in the call below
gtfs$trips # only has trips for 'service_id == "WEEK"'
#> route_id service_id trip_id
#> 1: lineA WEEK routeA1
#> 2: lineA WEEK routeA2
#> 3: lineB WEEK routeB
#> 4: lineC WEEK routeC
#> 5: lineD WEEK routeD1
#> 6: lineD WEEK routeD2
timetbl <- gtfs_timetable(gtfs, date = 20181001)
#> Error in filter_by_date(gtfs_cp, date): The date restricts service_ids to [EMPT] yet there are not trips for those service_ids
timetbl <- gtfs_timetable(gtfs, date = 20181006) # but that works
Created on 2020-08-13 by the reprex package (v0.3.0)
Well, EMPT
doesn't run Monday 2018-10-01 since its removed with exception type 2 in calendar_dates.txt
. However, WEEK
runs on said date (defined in calendar.txt
) and it includes all trips in the feed:
library(tidytransit)
g = read_gtfs("https://github.com/polettif/gtfs-test-feeds/raw/master/zip/routing.zip")
g$calendar
#> # A tibble: 4 x 10
#> service_id monday tuesday wednesday thursday friday saturday sunday start_date
#> <chr> <int> <int> <int> <int> <int> <int> <int> <date>
#> 1 WEEK 1 1 1 1 1 0 0 2018-10-01
#> 2 EXPR 0 0 0 0 0 0 0 2018-10-01
#> 3 WEND 0 0 0 0 0 1 1 2018-10-01
#> 4 EMPT 1 0 1 1 0 1 0 2018-10-01
#> # … with 1 more variable: end_date <date>
g$trips
#> # A tibble: 6 x 3
#> route_id service_id trip_id
#> <chr> <chr> <chr>
#> 1 lineA WEEK routeA1
#> 2 lineA WEEK routeA2
#> 3 lineB WEEK routeB
#> 4 lineC WEEK routeC
#> 5 lineD WEEK routeD1
#> 6 lineD WEEK routeD2
I don't know how services and dates are handled in gtfsrouter but IMO there's no way around creating a table from calendar and calendar_dates that links dates and service_ids. set_date_service_table does this for tidytransit and is used in filter_stop_times:
library(tidytransit)
g = read_gtfs("https://github.com/polettif/gtfs-test-feeds/raw/master/zip/routing.zip")
g <- set_date_service_table(g)
stop_times = filter_stop_times(g, "2018-10-01", 0, 24*3600)
head(stop_times[,1:5])
#> trip_id arrival_time departure_time stop_id stop_sequence
#> 1: routeA1 07:00:00 07:00:00 stop1a 1
#> 2: routeA1 07:04:00 07:05:00 stop2 2
#> 3: routeA1 07:11:00 07:12:00 stop3a 3
#> 4: routeA1 07:40:00 07:40:00 stop4 4
#> 5: routeA2 07:05:00 07:05:00 stop1a 1
#> 6: routeA2 07:09:00 07:10:00 stop2 2
Created on 2020-08-13 by the reprex package (v0.3.0)
Oh, that's easy - just required processing the 2 different exception_type
values. Above commit now does that, with the following result:
library(gtfsrouter)
if (!file.exists ("routing.zip"))
download.file("https://github.com/polettif/gtfs-test-feeds/raw/master/zip/routing.zip", "routing.zip")
gtfs = extract_gtfs("routing.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
timetbl <- gtfs_timetable(gtfs, date = 20181001)
head (timetbl$timetable)
#> departure_station arrival_station departure_time arrival_time trip_id
#> 1: 2 4 25200 25440 1
#> 2: 4 6 25500 25860 1
#> 3: 2 4 25500 25740 2
#> 4: 2 9 25800 26100 3
#> 5: 4 6 25800 26160 2
#> 6: 6 8 25920 27600 1
Created on 2020-08-13 by the reprex package (v0.3.0)
Do you know of any feeds which use exception_type = 1
in calendar_dates
? I'm not sure that would be appropriately handled by current code, but hard to know without an actual example of how that could be used - my guess is that that flag can only be meaningfully used to remove a bunch of services via exception_type = 2
, and then that add back some specific ones via exception_type = 1
. (Your example code is just a toy, and does not have entries in the trips
table for the service_id
values you've got in calendar
and calendar_dates
- real feeds which use those must have corresponding trips
entries.)
Do you know of any feeds which use
exception_type = 1
incalendar_dates
?
This is a example: https://transitfeeds.com/p/reseau-de-transport-de-la-capitale/40
There are some feeds that only have calendar_dates
with all the dates specified and no calendar
. I haven't worked with one personally but issues came up in another project ([1], [2]). These feeds normally use exception_type=1
.
(Your example code is just a toy, and does not have entries in the
trips
table for theservice_id
values you've got incalendar
andcalendar_dates
- real feeds which use those must have correspondingtrips
entries.)
You're absolutely right, I missed that no trips for EXPR
, WEND
and EMPT
lead to an invalid feed. However, I'd prefer to call it "test" instead of "toy" ;) I don't want to tell you how to implement date handling (might have sounded that way, sorry) I just want to highlight possible pitfalls.
all good - i really appreciate your help, and shall check out that example feed asap. Thanks for suggesting it! (And yeah, "test" is better than "toy" - sorry about my sloppy terminology there). I'll re-open this issue to ensure the code appropriately handles all possible calendar
<-> calendar_date
combinations
Thanks @polettif, that example seems to all work as expected with calendar_dates
:
library(gtfsrouter)
gtfs <- extract_gtfs ("./rtc-gtfs.zip") # Quebec
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed
#> Warning in data.table::fread(flist[f], integer64 = "character", showProgress =
#> FALSE): Found and resolved improper quoting in first 100 rows. If the fields are
#> not quoted (e.g. field separator does not appear within any field), try quote=""
#> to avoid this warning.
#> Warning in data.table::fread(flist[f], integer64 = "character", showProgress =
#> FALSE): Detected 1 column names but the data has 2 columns (i.e. invalid file).
#> Added 1 extra default column name for the first column which is guessed to be
#> row names or an index. Use setnames() afterwards if this guess is not correct,
#> or fix the file write command that created the file to create a valid file.
#> ✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs$calendar_dates [gtfs$calendar_dates$date == 20200201, ]
#> service_id date exception_type
#> 1: 20200511multiint-0000010 20200201 1
# that gives the service_id for that calendar date
gtfs <- gtfs_timetable (gtfs) # errors as expected
#> Error: This appears to be a GTFS feed which uses a 'calendar_dates' table instead of 'calendar'.
#> Please first construct timetable for a particular date using 'gtfs_timetable(gtfs, date = <date>)'
#> See https://developers.google.com/transit/gtfs/reference/#calendar_datestxt for details.
gtfs <- gtfs_timetable (gtfs, date = 20200201)
gtfs$timetable
#> departure_station arrival_station departure_time arrival_time trip_id
#> 1: 194 1910 18420 18480 2483
#> 2: 1910 217 18480 18480 2483
#> 3: 217 218 18480 18540 2483
#> 4: 218 219 18540 18600 2483
#> 5: 219 220 18600 18600 2483
#> ---
#> 124909: 4327 4329 101340 101460 1255
#> 124910: 4329 4331 101460 101520 1255
#> 124911: 4331 4104 101520 101580 1255
#> 124912: 4104 4105 101580 101640 1255
#> 124913: 4105 4189 101640 101700 1255
# timetable works
gtfs$trip_ids
#> trip_ids
#> 1: 66951554-20200511multiint-0000010
#> 2: 66950989-20200511multiint-0000010
#> 3: 66951027-20200511multiint-0000010
#> 4: 66951047-20200511multiint-0000010
#> 5: 66951011-20200511multiint-0000010
#> ---
#> 2527: 66950141-20200511multiint-0000010
#> 2528: 66951006-20200511multiint-0000010
#> 2529: 67140259-20200511multiint-0000010
#> 2530: 66952176-20200511multiint-0000010
#> 2531: 66951501-20200511multiint-0000010
# all trip_ids are of the specified service given above
Created on 2020-08-17 by the reprex package (v0.3.0)
I think that suffices to close this issue for now.
Running the following code with a fairly simple feed crashes RStudio:
I can't really tell where the issue is, transfers.txt looks like this:
Or maybe it's an issue with the date. I don't know how dates are extracted in gtfsrouter. In my understanding of gtfs_timetable's doc, the
date
parameter is only applied to calendar_dates.txt sinceCompare to tidytransit's approach where a date_service_table is calculated to see which services (and thus trips) run on which date.
Created on 2020-06-25 by the reprex package (v0.3.0)