multi-modal algorithm - Githubissues

mpadge commented 2 years ago

Those commits implement most of the algorithm. It accepts a "from" parameter, because it can not be presumed that the full matrix from and to all network nodes will fit in memory. That then returns a matrix of [nfrom, nnetwork] fastest travel times combining one defined mode plus GTFS times.

TODO:

[ ] Cache the intermediate matrices for re-use: GTFS-to-GTFS and GTFS-to-network

Then the only matrix that will need re-calculating for new calls with be from-to-GTFS, the other two loaded from cache, and all 3 passed straight to the C++ routines.

mpadge commented 2 years ago

For each from vertex, the current algorithm ends up looping over all GTFS stations times all other network vertices to find the shortest time to each terminal vertex. This is extremely inefficient, and needs to be improved by pre-calcualting a reduced set of vertices for each GTFS station. One viable way to approach this would be to calculate the N nearest stations to each network vertex, and then map back from each station only to those end vertices which include that station in its list.

mpadge commented 2 years ago

The above commits effectively implement two approaches to the final stage of the multi-modal algorithm, of calculating travel times from each GTFS stop out to all network points. These times need to be reduced to a subset of network points for each GTFS stop, so that times traced from each terminal stop then only trace a much smaller subset of all network points to find fastest times.

Calculate full times from each GTFS stop out to all network points, and reduce that to a subset for each GTFS stop of points which include that GTFS stop as one of its closest N stops.
Calculate times from each network point to the closest N GTFS stops, to directly generate the required lists.

The first approach is faster, but uses a huge amount of memory, likely to be too much for big cities or on systems with restricted memory. The second approach is slower, but uses far less memory, and should scale to any system and city size.

Both approaches will now be implemented, and selected by a parameter called something like, fast = FALSE, where TRUE will select the large-memory first version, and FALSE the second version.

mpadge commented 2 years ago

The above commits address a previous bug that when numbers of stations at same coordinates exceeds specified value of n_closest. That causes a fixed number of zero-distance stations to be identified as the closest, yet in a random and unpredicable order. The solution is to expand n_closest to be one larger than the largest number of spatially co-incident GTFS stops. Then all zero-distance stops are included, plus one extra at some d > 0. That now all works, plus conversion of final routines to using maps and sets has sped things up quite a bit, but it's still quite slow with the resultant, considerably larger values for n_closest.

TODO

[ ] For the "fast" algorithm, use a stop-specific value of n_closest, so submit that as a vector to the parallel routines. Each value can then be set as the maximum of (num duplicated + 1, n_closest).
[ ] Implement equivalent routines for the "slow" algorithm.

Edit: Nope, that approach is not viable. The routines in current form are fast enough anyway, so leaving as is for now.

mpadge commented 2 years ago

Those commits finish the first cut of this function. Still have to document and test properly.

mpadge commented 2 years ago

That is almost it. Now just have to work out times to all nodes with "initial_mode", and select those where they are fastest.

UrbanAnalyst / m4ra

multi-modal algorithm #8

TODO:

TODO