azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 11 forks source link

Investigate switching network algorithms to pgr_djikstra #74

Open spencerrecneps opened 7 years ago

spencerrecneps commented 7 years ago

Refer to https://github.com/azavea/pfb/pull/18

Test performance improvements from connecting census blocks via pgr_dijkstra instead of pgr_drivingdistance.

spencerrecneps commented 7 years ago

@flibbertigibbet Any thoughts on switching to pgr_johnson()? I'm finding that running pgr_dijkstra on each block pair is no more efficient (and possibly much less efficient) than the previous approach.

I believe pgr_johnson() could be used to populate the reachable_roads* tables much more efficiently than either of our other approaches. The drawback is that it appears to be a massive resource hog (which makes sense since we're effectively storing the whole graph in memory for the operation). Might also present problems for parallelization.

spencerrecneps commented 7 years ago

For reference I ran pgr_johnson() on roads within the test neighborhood boundary in Cambridge and returned 19 million results in 14 minutes. Not bad. Run time and number of rows would be higher if I expanded to our 11,000 ft max distance but still seems manageable. Trying to run it on the entire region crashed our PostGIS server.

flibbertigibbet commented 7 years ago

The intent with trying pgr_dijkstra I thought was to filter the origin nodes to only those in bounds, which the driving distance function would not allow. That should reduce run time by about 3/4, as that's how many nodes (for Boulder at least) are in the graph but outside the neighborhood bounds.

spencerrecneps commented 7 years ago

I believe the current implementation already filters origin nodes to those within the neighborhood boundary. This is being accomplished by:

EXISTS (
            SELECT  1
            FROM    neighborhood_boundary AS b
            WHERE   ST_Intersects(b.geom,r1.geom)
)

I think the main source of the performance issues is that we're recreating the entire road graph every time we make a call to either pgr_drivingdistance or pgr_dijkstra. If we go with pgr_johnson we create the graph only once and then store the results in the neighborhood_reachable_roads table.

Come to think of it, that's probably a good way to describe the whole setup: we're effectively saving the graph to a table for quick retrieval later. So pgr_johnson basically lets us dump the graph all at once whereas with pgr_drivingdistance and pgr_dijkstra we're dumping the graph piece by piece.

I've created a branch on my fork to implement pgr_johnson. I'll submit it to you for testing when I'm done and we'll see how it stacks up.

flibbertigibbet commented 7 years ago

The filter above is in the outer query and not in the nodes specified to the pgr function. I expect it is calculating the travelshed for every graph node, then discarding the results for those outside the bounds when it gets to filtering in the outer query.

spencerrecneps commented 7 years ago

I'm rubbish with EXPLAIN ANALYZE but in theory wouldn't the planner be incorporating this filter by throwing out ways (and by extension the origin vertices) that don't pass the test before calling pgr_drivingdistance?

In any case, I tested an implementation with pgr_dijkstra yesterday and stopped it after almost 2 hours. I can't remember what my times were for pgr_drivingdistance but I'm pretty sure pgr_johnson will be orders of magnitude faster and deliver the same results.