UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
81 stars 17 forks source link

frequencies to stop_times added #25

Closed stmarcin closed 4 years ago

stmarcin commented 4 years ago

Related to the issue #13

The pull request with new function frequencies_to_stop_times():

The function works on a copy of gtfs object. First it checks whether gtfs contains no-empty frequencies.txt.

Second it convert start_time and end_time from frequencies into numeric variable (seconds from the beginning of the day – as in case of stop_times.txt (I used your rcpp_time_to_seconds() function).

Third, it extract stop_times which have their equivalent in frequencies. Those records that do not have their counterparts in frequencies, are just kept, other are transformed. At the end of the script both parts are combined. As an individual trip_id is required for each trip, there is a new column trip_id_f which is either equal to trip_id (copied rows) or adjusted (equal to: paste(trip_id, n, sep = "_"), where n is subsequent trip within the same trip_id (so the stop_sequence is doubled for the same trip_id_f). Not sure if it is necessary and in case it is – whether it should be trip_id or the new column which contains individual id (original is required to link row to other files of a feed, e.g. transfers.txt.

Fourth – it multiply each row of stop_time in order to obtain separate trip based on a frequency in a given period of time. It contains the same info as original row, except the updated arrival_time and departure_time.

Addiotional details: Even though there exists frequencies.txt in some feeds the first arrival time starts at different time than 0 (i.e. 00:00:00). In case of this script the first arrival has to start at 0, so the following code reset arrival_time and departure_time so they starts from 00:00:00.

if (stop_times_trip [1] [["arrival_time"]] > 0)  
{
    stop_times_trip [, c ("arrival_time", "departure_time") := list(
        arrival_time - stop_times_trip [1] [["arrival_time"]],
        departure_time - stop_times_trip [1] [["arrival_time"]]     )]
}

The following part ensure a “smooth” transition between different periods of frequencies. If there is a break in trip, then it sets first departure at the new start_time. If it is continuous, the first departure time is set applying the new headway but starting from the last departure during the previous frequency period.

ifelse (
    headway_old - start_t + frequencies_trip [i] [["start_time"]] < headway,
    start_t <- start_t - headway_old + headway,
    start_t <- frequencies_trip [i] [["start_time"]]
)

Tested on the following feeds:

It probably requires some improvements, but I hope it helps.

mpadge commented 4 years ago

Thanks a load @stmarcin for the great contribution. I'll have a go through the code asap and get back to you. One thing we'll need to address will be testing, which could be done by manually changing the contents of berlin_gtfs_to_zip(), which dumps a tiny GTFS to tempdir(). That could be modified by removing more or all of timetable.txt, and replacing with a made-up frequencies.txt, or something like that. There'll be a couple of other minor things needed prior to merging too, importantly including adding your name to the DESCRIPTION file, but that can come later.

(And don't worry about the failing CI build for this PR - that is unrelated to your code.)

stmarcin commented 4 years ago

Thanks @mpadge! I think the easiest solution would be to replace stop_times by frequencies of one or more lines (routes) in your example GTFS. I will try to modify this feed accordingly and update the data-script.Rmd file and add them to this PR.

Do you want to have one file for all tests or a separate one just for frequencies?

mpadge commented 4 years ago

If you modify data-script.Rmd, you run the risk of breaking lots of existing tests and having to fix them again. I suspect it would be easier just to write a single new tests that does the explicit modification of timetable.txt and insertion of frequencies.txt within the test file itself. That ways it'll be self-contained, and should be less work for you. You should ideally just need to insert a new test_frequencies.R file, with everything self-contained there, and all other tests guaranteed to work regardless of what you implement there. That would at least be my suggestion for the road-of-least-pain :smile:

... That said, feel free to modify data-script.Rmd if you really want, but please just ensure that all current tests either continue to work, or get modified to work regardless. (I guess ideally by inserting a single frequencies_to_stop_times() call before doing any actual analyses.)

stmarcin commented 4 years ago

Thanks for your advice. I have added a couple of tests to the PR (in a new test file) and adjust function in order to capture more potential errors. To be honest: they were the first tests I wrote for any R package. I have learned a lot from your tests, and I hope mine are prepared accordingly. There are some comments # in the test file, which can be removed later.

I used berlin_gtfs_to_zip () function to create an example gtfs and then adjust it accordingly:

Tests:

One more point: when running tests I got the following error:

test-extract.R:12: failure: extract non gtfs g <- extract_gtfs(fz) threw an error with unexpected message.

I haven't change nor extract-gtfs.R file nor extract test. As far as I can see there is a difference between both messages. I tried to correct expected message in extract non gtfs test, but did not succeed.

stmarcin commented 4 years ago

Done. I hope I haven't skipped any comments. Thanks for reviewing.