UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
127 stars 16 forks source link

dodgr_dists unexpected output with 0's, NA's, and repeating values #94

Closed lga455 closed 5 years ago

lga455 commented 5 years ago

I am trying to use dodgr_dists() to find the walking distance to the nearest transit stop from the centroid of all census block groups (bg) in Harris County, Texas. However, when I used dodgr_dists(), the output between origin and destination returns incorrect results such that the minimum walk distance for nearly all block groups ends up being 0 or NA. Technically, even if the block group centroid is adjacent to a bus stop, it should incur some walking distance (i.e. not 0). I've tried to snap the to/from points to the network using snapPointsToLines() and st_snap() all while trying different coordinate systems and projections, but to no avail...the odd output is always the same. Any insights as to what I might be doing wrong? Or is there a bug?

library(dodgr)
library(sf)
harriscty <- dodgr_streetnet("harris county texas")
graphtest <- weight_streetnet(harriscty, wt_profile = "foot", type_col = "highway",
                          id_col = "osm_id", keep_cols = NULL)
saveRDS(graphtest, file = "harrisctytx.rds")
##origin points are the centroid of each census block group from tidycensus get_acs() 
pts_origin <- st_coordinates(st_centroid(harris_bg_wide))
##destination points are based on transit stop coordinates using tidytransit read_gtfs()
pts_dest_pre <- st_coordinates(pre_merged)
dists_pre <- dodgr_dists(graphtest, from = pts_origin, to = pts_dest_pre)
mpadge commented 5 years ago

Your code is not exactly reproducible, but seems fine to me. Here's some repro code, with data downloaded from https://www.census.gov/programs-surveys/geography.html and from http://gtfs.s3.amazonaws.com/metropolitan-transit-authority-of-harris-county_20150228_0202.zip.

ibrary (dodgr)
harriscty <- dodgr_streetnet("harris county texas", quiet = FALSE)
graph <- weight_streetnet (harriscty, wt_profile = "foot")

library (sf)
library (tidytransit)

f <- "~/harris/PVS_18_v2_bg_48201.shp"
bg <- st_read (f)
#> Reading layer `PVS_18_v2_bg_48201' from data source 
#> `~harris/PVS_18_v2_bg_48201.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 2144 features and 11 fields
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: -95.96073 ymin: 29.49734 xmax: -94.90849 ymax: 30.17061
#> epsg (SRID):    4269
#> proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
pts_origin <- st_coordinates (st_centroid (bg))

f <- "~/harris/metropolitan-transit-authority-of-harris-county_20150228_0202.zip"
g <- read_gtfs (f, local = TRUE)
pts_destination <- st_coordinates (get_stop_geometry (g$stops))
# easier: unzip gtfs, then just:
#f <- "~/harris/gtfs/stops.txt"
#pts_destination <- read.csv (f, header = TRUE) [, c ("stop_lon", "stop_lat")]
head (pts_origin); head (pts_destination)
#>           X        Y
#> 1 -95.35717 29.75783
#> 2 -95.36776 29.75184
#> 3 -95.36522 29.76210
#> 4 -95.36373 29.77801
#> 5 -95.27677 29.76820
#> 6 -95.36902 29.81666
#>           X        Y
#> 1 -95.42817 29.84687
#> 2 -95.42614 29.84689
#> 3 -95.42298 29.85125
#> 4 -95.42303 29.85395
#> 5 -95.41256 29.85001
#> 6 -95.41804 29.85868

from <- pts_origin [sample (nrow (pts_origin), size = 10), ]
to <- pts_destination [sample (nrow (pts_destination), size = 10), ]
d <- dodgr_dists (graph, from = from, to = to)
knitr::kable (d)
2261 9129 3477 521 7486 8918 252 385 5390 9302
757 40.40612 19.012758 14.396011 23.931537 25.242325 13.78829 26.700196 30.227444 25.54704 26.778192
1192 40.99003 18.783463 19.709623 28.464066 17.231350 13.42298 26.720711 23.745451 21.78452 31.276992
846 41.73812 27.514597 19.162934 12.470781 44.886363 27.85822 27.790193 40.918917 35.17555 13.962029
755 41.97875 19.772177 15.258531 24.794057 24.113942 14.41170 27.709426 29.099060 24.41866 27.640711
806 27.83260 13.489634 7.473364 4.140731 31.808631 13.94024 13.884671 26.992435 22.17299 7.131049
1787 24.35624 22.604066 34.439812 36.576999 17.785065 24.46813 26.805372 10.035370 15.23999 37.976608
520 18.90202 4.815252 13.303007 12.992672 20.970645 8.90283 5.271957 14.860938 10.52866 14.520345
1273 36.73316 14.211295 22.213769 28.736013 8.520340 13.09857 24.042623 15.720554 14.51991 31.309797
1673 27.40877 18.402957 29.604144 32.792218 4.707843 20.04537 23.254500 5.949476 8.01213 34.191827
1708 33.78121 23.028658 34.229844 37.073302 4.911296 24.67107 28.264853 13.056411 14.78517 39.647085

Created on 2019-05-07 by the reprex package (v0.2.1)

No zeros; no NAs.


A few notes:

  1. As always, please make sure you've got the latest version of dodgr - you can also install the current development version with remotes::install_github("ATFutures/dodgr").
  2. Maybe you're just asking too much of your local memory? I get 2,144 origin points, 9,840 destination points, and a graph with 3,116,721 edges - that's truly a very, very huge job! You should at least try contracting the graph first with dodgr_contract_graph() - that cuts it down to 1/3 size (1,123,227 edges). Make sure in that case that you put all pts_<...> in the verts argument of dodgr_contract_graph(), so those will be kept in.
  3. You can "snap" points to the network with dodgr::match_points_to_graph(). That'll make the final distance calculation a bit quicker.
lga455 commented 5 years ago

@mpadge thank you very much for your help on this! I have one question about your output: what is the unit of the values in the matrix? Since the input data is always just lat longs, does this mean that dodgr_dists() always spits out matrix results in decimal degrees? Or meters? Based on the context of this example, I'm thinking that it's probably km... Here is the full code that I used based on your suggestions (however, I wasn't able to get the dodgr_contract_graph() to work without RStudio returning a fatal error and shutting down).

library(dodgr)
library(sf)
library(tidycensus)
library(tidytransit)
library(tidyr)
library(viridis)
library(maptools)
library(matrixStats)
library(dplyr)

harriscty <- dodgr_streetnet("harris county texas", quiet = FALSE)
graph <- weight_streetnet (harriscty, wt_profile = "foot")
nrow(graph)
###running the following line causes a fatal error that shuts the system down
#graph <- dodgr_contract_graph(graph)
#nrow(graph$graph)
saveRDS(graph, file = "harrisctytx.rds")

##pull down demographic data from American Communities Survey (ACS)
hlstatusvars <- c("B03002_003", # white alone
                  "B03002_004", # black alone
                  "B03002_006", # Asian
                  "B03002_012") # Hispanic or Latino
harris_bg <- get_acs(geography = "block group", variables = hlstatusvars, 
                     state = "TX", year = 2015,
                     county = "Harris County",
                     geometry = TRUE,
                     summary_var = "B03002_001")

##reshape long dataset to wide dataset for future calculations
harris_bg_wide <- spread(harris_bg, variable, estimate, fill = NA, convert = FALSE)
harris_bg_drop <- subset(harris_bg, select = c(GEOID, variable, estimate, summary_est))
harris_bg_wide <- spread(harris_bg_drop, variable, estimate, convert = FALSE)
pts_origin <- st_coordinates (st_centroid (harris_bg_wide))

## GTFS feeds obtained from transitfeeds.com
## The “before” dataset represents service from May 23, 2015 to August 15, 2015 
## and the “after” dataset represents service from August 16, 2015 to January 23, 2016. 
pre_sr_gtfs <- read_gtfs("data/20150517_htx.zip", 
                         local = TRUE,
                         geometry = TRUE,
                         frequency = TRUE)
pts_destination_preSR <- st_coordinates (get_stop_geometry (pre_sr_gtfs$stops))
post_sr_gtfs <- read_gtfs("data/20150818_htx.zip", 
                          local = TRUE,
                          geometry = TRUE,
                          frequency = TRUE)
pts_destination_postSR <- st_coordinates (get_stop_geometry (post_sr_gtfs$stops))

head (pts_origin); head (pts_destination_preSR); head (pts_destination_postSR)

from <- pts_origin
to_preSR <- pts_destination_preSR
to_postSR <- pts_destination_postSR
d_preSR <- dodgr_dists (graph, from = from, to = to_preSR)
d_postSR <- dodgr_dists (graph, from = from, to = to_postSR)
mpadge commented 5 years ago

That fatal error indicates that you're using an older version of dodgr - if you update to latest dev version it should be fine. The package is also undergoing substantial changes at present, so yes, those distances pasted above were kilometres, but actually the latest version now uses SI standard metres. Please try installing remotes::install_github and re-run. All should hopefully work.

lga455 commented 5 years ago

Thanks @mpadge . I've updated to the latest version as recommended and now the first line of my code does not work: harriscty <- dodgr_streetnet("harris county texas", expand = 0, quiet = FALSE)

It returns:

#> Error in process_bbox(bbox) : 
#> argument "expand" is missing, with no default

Is this a known bug?

mpadge commented 5 years ago

Yep, and fixed too, a few days ago - please update again. Sorry about that!


Edit: Just discovered I hadn't pushed that fix - now rectified.