degauss-org / roads

DeGAUSS container that calculates distance to nearest roadway and length of roadways within buffer for S1100 and S1200 roads
http://degauss.org/roads/
GNU General Public License v3.0
0 stars 0 forks source link

st_intersection is slow #3

Closed erikarasnick closed 2 years ago

erikarasnick commented 2 years ago
Screen Shot 2022-02-16 at 3 18 47 PM

These are 5 points I used to test the aadt container when buffers overlap state boundaries.

  1. If I do st_intersection(buffers, roads1200) I get one list of all s1200 roads that intersect any of the input buffers, not lines that intersect each buffer, so we couldn't use that to determine road length for each input point. So, that is why I was using map.

  2. purrr::map(1:nrow(buffers), ~st_intersection(buffers[.x,], roads1200)) using s1200 roads for the whole country for 5 points with 400m buffers results in the follwing timing

  user  system elapsed 
 16.748   0.766  17.538 
  1. I tried splitting the roads by state prior to the intersection.

only ohio roads and buffers for only the 4 points that are at least partially in ohio purrr::map(1:nrow(d_buffers[["ohio"]]), ~st_intersection(d_buffers[["ohio"]][.x,], roads_oh))

   user  system elapsed 
  0.787   0.015   0.805 

and only indiana roads and buffers for only the 2 points that are at least partially in indiana purrr::map(1:nrow(d_buffers[["indiana"]]), ~st_intersection(d_buffers[["indiana"]][.x,], roads_in))

   user  system elapsed 
  0.228   0.003   0.231 

Note that 1 input point overlaps both states, and is included in both the ohio and the indiana calculation. I'm not sure how this would affect the timing if a large number of points were overlapping multiple states, since they would be duplicated for each state. However, with a 400m buffer I would assume most points would be contained within a state.

Seems like it could potentially be a lot faster...

cole-brokamp commented 2 years ago

Would we have to remake the data into smaller chunks like we did with aadt? or would this still work with the roads object as we have it now?

Also, I was reading that st_intersects is much faster than st_intersection. I'm wondering if it would be possible to use st_intersects to get the index of the road lines that are intersecting with each buffer and then map over each buffer to calculate the length. This would still require mapping over the buffers, but each one would only be calculating the geographic intersection and length for a small subset of the total roads file so it might be faster.

erikarasnick commented 2 years ago

Yes, it would require re-making the data. Using st_intersects before st_intersection seems promising. I will try it!

erikarasnick commented 2 years ago

closed in #4