Open spencerrecneps opened 7 years ago
@flibbertigibbet Any thoughts on switching to pgr_johnson()? I'm finding that running pgr_dijkstra on each block pair is no more efficient (and possibly much less efficient) than the previous approach.
I believe pgr_johnson() could be used to populate the reachable_roads* tables much more efficiently than either of our other approaches. The drawback is that it appears to be a massive resource hog (which makes sense since we're effectively storing the whole graph in memory for the operation). Might also present problems for parallelization.
For reference I ran pgr_johnson() on roads within the test neighborhood boundary in Cambridge and returned 19 million results in 14 minutes. Not bad. Run time and number of rows would be higher if I expanded to our 11,000 ft max distance but still seems manageable. Trying to run it on the entire region crashed our PostGIS server.
The intent with trying pgr_dijkstra
I thought was to filter the origin nodes to only those in bounds, which the driving distance function would not allow. That should reduce run time by about 3/4, as that's how many nodes (for Boulder at least) are in the graph but outside the neighborhood bounds.
I believe the current implementation already filters origin nodes to those within the neighborhood boundary. This is being accomplished by:
EXISTS (
SELECT 1
FROM neighborhood_boundary AS b
WHERE ST_Intersects(b.geom,r1.geom)
)
I think the main source of the performance issues is that we're recreating the entire road graph every time we make a call to either pgr_drivingdistance or pgr_dijkstra. If we go with pgr_johnson we create the graph only once and then store the results in the neighborhood_reachable_roads table.
Come to think of it, that's probably a good way to describe the whole setup: we're effectively saving the graph to a table for quick retrieval later. So pgr_johnson basically lets us dump the graph all at once whereas with pgr_drivingdistance and pgr_dijkstra we're dumping the graph piece by piece.
I've created a branch on my fork to implement pgr_johnson. I'll submit it to you for testing when I'm done and we'll see how it stacks up.
The filter above is in the outer query and not in the nodes specified to the pgr function. I expect it is calculating the travelshed for every graph node, then discarding the results for those outside the bounds when it gets to filtering in the outer query.
I'm rubbish with EXPLAIN ANALYZE but in theory wouldn't the planner be incorporating this filter by throwing out ways (and by extension the origin vertices) that don't pass the test before calling pgr_drivingdistance?
In any case, I tested an implementation with pgr_dijkstra yesterday and stopped it after almost 2 hours. I can't remember what my times were for pgr_drivingdistance but I'm pretty sure pgr_johnson will be orders of magnitude faster and deliver the same results.
Refer to https://github.com/azavea/pfb/pull/18
Test performance improvements from connecting census blocks via pgr_dijkstra instead of pgr_drivingdistance.