UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
128 stars 16 forks source link

weight_streenet error: "Error in `$<-.data.frame..." #132

Closed darinchristensen closed 4 years ago

darinchristensen commented 4 years ago

Background

We've constructed a shapefile of major roads across West and Central Africa. This is based largely on OSM data, but is supplemented with other regional sources (e.g., OCHA). (An image is included at the bottom of this issue.)

select(rds_shp, id, fclass)
Simple feature collection with 217853 features and 2 fields
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: -17.51197 ymin: -4.997724 xmax: 37.73438 ymax: 37.30094
geographic CRS: WGS 84
First 10 features:
       id  fclass                       geometry
1   BEN-1   trunk LINESTRING (2.351141 6.3883...
2   BEN-2   trunk LINESTRING (2.351141 6.3883...
3   BEN-3   trunk LINESTRING (2.438702 6.3713...
4   BEN-4   trunk LINESTRING (2.45154 6.37201...
5   BEN-5 primary LINESTRING (2.610743 7.3667...
# truncated

I've confirmed that all geometries are LINESTRING.

I've also checked that all fclass (i.e., road type) categories have an entry in the weighting profile for motorcars:

t1 <- rds_shp %>% tbl_df() %>% count(fclass)
t2 <- weighting_profiles$weighting_profiles %>% filter(name == "motorcar")
left_join(t1, t2, by = c("fclass" = "way"))
# A tibble: 12 x 5
   fclass             n name     value max_speed
   <chr>          <int> <chr>    <dbl>     <dbl>
 1 motorway        8531 motorcar   1          90
 2 motorway_link   7534 motorcar   1          45
 3 primary        59379 motorcar   0.9        65
 4 primary_link    9868 motorcar   0.9        30
 5 secondary      71569 motorcar   0.8        55
 6 secondary_link  7494 motorcar   0.8        25
 7 service            3 motorcar   0.4        15
 8 tertiary        4681 motorcar   0.7        40
 9 tertiary_link    156 motorcar   0.7        20
10 trunk          35163 motorcar   1          85
11 trunk_link     12714 motorcar   1          40
12 unclassified     761 motorcar   0.6        25

This also shows the distribution of road types.

Error

When I try to use weight_streetnet I get the following error:

rd_graph <- weight_streetnet(
   x = select(rds_shp, id, fclass), 
   type_col = "fclass", 
   id_col = "id",
   wt_profile = "motorcar"
 )
Error in `$<-.data.frame`(`*tmp*`, "from_id", value = c("1", "2", "2",  : 
  replacement has 17507248 rows, data has 17506772

Note that the function actually runs fine on subsets of the data:

test_rds <- select(rds_shp, id, fclass) %>% sample_n(10000)
subsamp_test <- weight_streetnet(
  x = test_rds, 
  type_col = "fclass", 
  id_col = "id",
  wt_profile = "motorcar"
)
tbl_df(subsamp_test)

# A tibble: 737,404 x 15
   geom_num edge_id from_id from_lon from_lat to_id to_lon to_lat     d d_weighted highway way_id component
      <dbl>   <int> <chr>      <dbl>    <dbl> <chr>  <dbl>  <dbl> <dbl>      <dbl> <chr>   <chr>      <int>
 1        1       1 1           6.91     36.9 2       6.91   36.9  1.80       2.00 primary DZA-5…      4741
 2        1       2 2           6.91     36.9 1       6.91   36.9  1.80       2.00 primary DZA-5…      4741
 3        1       3 2           6.91     36.9 3       6.91   36.9  1.94       2.15 primary DZA-5…      4741
 4        1       4 3           6.91     36.9 2       6.91   36.9  1.94       2.15 primary DZA-5…      4741
 5        1       5 3           6.91     36.9 4       6.91   36.9  1.93       2.14 primary DZA-5…      4741
# truncated
# … with 737,394 more rows, and 2 more variables: time <dbl>, time_weighted <dbl>

Plot of sf object (rds_shp)

rds_shp

Session Info

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cppRouting_2.0    dodgr_0.2.6.001   data.table_1.12.8 sf_0.9-2          forcats_0.5.0    
 [6] stringr_1.4.0     dplyr_0.8.5       purrr_0.3.4       readr_1.3.1       tidyr_1.0.2      
[11] tibble_3.0.1      ggplot2_3.3.0     tidyverse_1.3.0  

loaded via a namespace (and not attached):
 [1] tidyselect_1.0.0   haven_2.2.0        lattice_0.20-41    colorspace_1.4-1   vctrs_0.2.4       
 [6] generics_0.0.2     utf8_1.1.4         rlang_0.4.5        e1071_1.7-3        pillar_1.4.3      
[11] glue_1.4.0         withr_2.1.2        DBI_1.1.0          sp_1.4-1           dbplyr_1.4.2      
[16] modelr_0.1.6       readxl_1.3.1       lifecycle_0.2.0    munsell_0.5.0      gtable_0.3.0      
[21] cellranger_1.1.0   rvest_0.3.5        callr_3.4.3        ps_1.3.2           curl_4.3          
[26] class_7.3-17       fansi_0.4.1        broom_0.5.5        Rcpp_1.0.4.6       KernSmooth_2.23-17
[31] scales_1.1.0       backports_1.1.5    classInt_0.4-3     RcppParallel_5.0.0 jsonlite_1.6.1    
[36] farver_2.0.3       fs_1.3.2           digest_0.6.25      hms_0.5.3          stringi_1.4.6     
[41] processx_3.4.2     grid_3.6.3         cli_2.0.2          tools_3.6.3        magrittr_1.5      
[46] crayon_1.3.4       pkgconfig_2.0.3    ellipsis_0.3.0     osmdata_0.1.3      xml2_1.3.2        
[51] reprex_0.3.0       lubridate_1.7.8    assertthat_0.2.1   httr_1.4.1         rstudioapi_0.11   
[56] R6_2.4.1           igraph_1.2.5       units_0.6-6        nlme_3.1-147       compiler_3.6.3   
mpadge commented 4 years ago

Hmmm, I fear that's going to be a little tricky. I'm sure your error is genuine, and i'd wager that it indeed reflects an internal dodgr error. The problem is in feeding irregular structures through to weight_streetnet() - it really has been fairly rigidly coded for all OSM possibilities, but hardly tested for possibilities beyond that. And so my first request back to you would be whether you might be able to contrive a smaller example that prompts this error, and to provide a reprex of the whole thing, including downloading or otherwise accessing the data set which caused it? (It may even somehow be OSM itself, which also has no guarantee of being free of errors, being user-generated as it is.)

In the meantime, one guess at a workaround might be to first call dodgr_cache_off() prior to weight_streetnet()? Even if that is so, some reproducible code will still be necessary to uncover what's going on here. Thanks for the input!

darinchristensen commented 4 years ago

Thanks @mpadge for the quick response!

First, I tried calling dodgr_cache_off() before weight_streetnet() but no luck.

Second, despite many attempts, I can't reproduce the error in a sub-sample. You'll see below that the function successfully runs on two exhaustive subsets of the data: one using only roads from OSM; another using those roads from non-OSM sources.

I've included a reprex below. The read_rds call uses a Dropbox link to the data.

rm(list = ls()) 

library(tidyverse)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(dodgr)
library(reprex)

rds_shp <- read_rds(url("<removed>"))
(rds_shp)
#> Simple feature collection with 217853 features and 7 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -17.51197 ymin: -4.997724 xmax: 37.73438 ymax: 37.30094
#> geographic CRS: WGS 84
#> First 10 features:
#>     fclass     length NAME_0     NAME_1        NAME_2 source     id
#> 1    trunk 27.8370885  Benin Atlantique Abomey-Calavi    osm  BEN-1
#> 2    trunk 27.8370885  Benin Atlantique        Ouidah    osm  BEN-2
#> 3    trunk  0.3558411  Benin   Littoral       Cotonou    osm  BEN-3
#> 4    trunk  1.4257407  Benin   Littoral       Cotonou    osm  BEN-4
#> 5  primary 12.2490001  Benin    Plateau         Kétou    osm  BEN-5
#> 6  primary  6.5046034  Benin    Plateau         Kétou    osm  BEN-6
#> 7    trunk  0.2505886  Benin       Mono    Grand-Popo    osm  BEN-7
#> 8  primary  0.2061700  Benin       Mono          Comè    osm  BEN-8
#> 9    trunk  1.3588512  Benin       Mono          Comè    osm  BEN-9
#> 10   trunk  0.0452230  Benin       Mono          Comè    osm BEN-10
#>                          geometry
#> 1  LINESTRING (2.351141 6.3883...
#> 2  LINESTRING (2.351141 6.3883...
#> 3  LINESTRING (2.438702 6.3713...
#> 4  LINESTRING (2.45154 6.37201...
#> 5  LINESTRING (2.610743 7.3667...
#> 6  LINESTRING (2.689136 7.4068...
#> 7  LINESTRING (1.810371 6.2867...
#> 8  LINESTRING (1.882023 6.4033...
#> 9  LINESTRING (1.943153 6.3886...
#> 10 LINESTRING (1.939842 6.3890...

# _______________________________________________________________
# confirm that all road types are reflected in the weighting profile:

t1 <- rds_shp %>% tbl_df() %>% count(fclass)
t2 <- weighting_profiles$weighting_profiles %>% filter(name == "motorcar")
wt_df <- left_join(t1, t2, by = c("fclass" = "way"))
#> Warning: Column `fclass`/`way` joining factor and character vector, coercing
#> into character vector
rm(t1, t2)

(wt_df)
#> # A tibble: 12 x 5
#>    fclass             n name     value max_speed
#>    <chr>          <int> <chr>    <dbl>     <dbl>
#>  1 motorway        8531 motorcar   1          90
#>  2 motorway_link   7534 motorcar   1          45
#>  3 primary        59379 motorcar   0.9        65
#>  4 primary_link    9868 motorcar   0.9        30
#>  5 secondary      71569 motorcar   0.8        55
#>  6 secondary_link  7494 motorcar   0.8        25
#>  7 service            3 motorcar   0.4        15
#>  8 tertiary        4681 motorcar   0.7        40
#>  9 tertiary_link    156 motorcar   0.7        20
#> 10 trunk          35163 motorcar   1          85
#> 11 trunk_link     12714 motorcar   1          40
#> 12 unclassified     761 motorcar   0.6        25

# _______________________________________________________________
# running on two exhaustive sub-sets of rds_shp:

dodgr_cache_off()

# restricing to roads from OSM:
(osm_rds <- rds_shp %>% filter(source == "osm") %>% select(id, fclass))
#> Simple feature collection with 215558 features and 2 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -17.51197 ymin: -4.997724 xmax: 37.73438 ymax: 37.30094
#> geographic CRS: WGS 84
#> First 10 features:
#>        id  fclass                       geometry
#> 1   BEN-1   trunk LINESTRING (2.351141 6.3883...
#> 2   BEN-2   trunk LINESTRING (2.351141 6.3883...
#> 3   BEN-3   trunk LINESTRING (2.438702 6.3713...
#> 4   BEN-4   trunk LINESTRING (2.45154 6.37201...
#> 5   BEN-5 primary LINESTRING (2.610743 7.3667...
#> 6   BEN-6 primary LINESTRING (2.689136 7.4068...
#> 7   BEN-7   trunk LINESTRING (1.810371 6.2867...
#> 8   BEN-8 primary LINESTRING (1.882023 6.4033...
#> 9   BEN-9   trunk LINESTRING (1.943153 6.3886...
#> 10 BEN-10   trunk LINESTRING (1.939842 6.3890...

osm_graph <- weight_streetnet(
  x = osm_rds, 
  type_col = "fclass", 
  id_col = "id",
  wt_profile = "motorcar"
)

tbl_df(osm_graph)
#> # A tibble: 16,388,530 x 15
#>    geom_num edge_id from_id from_lon from_lat to_id to_lon to_lat     d
#>       <dbl>   <int> <chr>      <dbl>    <dbl> <chr>  <dbl>  <dbl> <dbl>
#>  1        1       1 1           2.35     6.39 2       2.35   6.39  96.0
#>  2        1       2 2           2.35     6.39 1       2.35   6.39  96.0
#>  3        1       3 2           2.35     6.39 3       2.35   6.39  21.0
#>  4        1       4 3           2.35     6.39 2       2.35   6.39  21.0
#>  5        1       5 3           2.35     6.39 4       2.35   6.39  32.8
#>  6        1       6 4           2.35     6.39 3       2.35   6.39  32.8
#>  7        1       7 4           2.35     6.39 5       2.35   6.39  36.5
#>  8        1       8 5           2.35     6.39 4       2.35   6.39  36.5
#>  9        1       9 5           2.35     6.39 6       2.35   6.39  17.7
#> 10        1      10 6           2.35     6.39 5       2.35   6.39  17.7
#> # … with 16,388,520 more rows, and 6 more variables: d_weighted <dbl>,
#> #   highway <chr>, way_id <chr>, component <int>, time <dbl>,
#> #   time_weighted <dbl>

# restricing to roads NOT from OSM:
(nonosm_rds <- rds_shp %>% filter(source != "osm") %>% select(id, fclass))
#> Simple feature collection with 2295 features and 2 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 0.2206525 ymin: 7.456352 xmax: 22.9212 ymax: 23.606
#> geographic CRS: WGS 84
#> First 10 features:
#>          id  fclass                       geometry
#> 1  NER-3104 primary LINESTRING (7.377661 17.789...
#> 2  NER-3105 primary LINESTRING (7.377661 17.789...
#> 3  NER-3106 primary LINESTRING (7.377661 17.789...
#> 4  NER-3107 primary LINESTRING (7.429513 17.458...
#> 5  NER-3108 primary LINESTRING (7.925001 16.927...
#> 6  NER-3109 primary LINESTRING (7.925001 16.927...
#> 7  NER-3110 primary LINESTRING (7.988597 16.989...
#> 8  NER-3111 primary LINESTRING (6.504773 15.724...
#> 9  NER-3112 primary LINESTRING (6.504773 15.724...
#> 10 NER-3113 primary LINESTRING (6.504773 15.724...

nonosm_graph <- weight_streetnet(
  x = nonosm_rds, 
  type_col = "fclass", 
  id_col = "id",
  wt_profile = "motorcar"
)

tbl_df(nonosm_graph)
#> # A tibble: 1,118,242 x 15
#>    geom_num edge_id from_id from_lon from_lat to_id to_lon to_lat      d
#>       <dbl>   <int> <chr>      <dbl>    <dbl> <chr>  <dbl>  <dbl>  <dbl>
#>  1        1       1 1           7.38     17.8 2       7.37   17.9 12955.
#>  2        1       2 2           7.37     17.9 1       7.38   17.8 12955.
#>  3        1       3 2           7.37     17.9 3       7.34   18.1 19173.
#>  4        1       4 3           7.34     18.1 2       7.37   17.9 19173.
#>  5        1       5 3           7.34     18.1 4       7.33   18.2 13379.
#>  6        1       6 4           7.33     18.2 3       7.34   18.1 13379.
#>  7        1       7 4           7.33     18.2 5       7.32   18.3 10446.
#>  8        1       8 5           7.32     18.3 4       7.33   18.2 10446.
#>  9        1       9 5           7.32     18.3 6       7.37   18.3  8204.
#> 10        1      10 6           7.37     18.3 5       7.32   18.3  8204.
#> # … with 1,118,232 more rows, and 6 more variables: d_weighted <dbl>,
#> #   highway <chr>, way_id <chr>, component <int>, time <dbl>,
#> #   time_weighted <dbl>

# _______________________________________________________________
# runningo on the entire data:
full_graph <- weight_streetnet(
  x = rds_shp, 
  type_col = "fclass", 
  id_col = "id",
  wt_profile = "motorcar"
)
#> Error in `$<-.data.frame`(`*tmp*`, "from_id", value = c("1", "2", "2", : replacement has 17507248 rows, data has 17506772

Created on 2020-04-30 by the reprex package (v0.3.0)

mpadge commented 4 years ago

great, thanks for the link - i'll have a look into it asap, and let you know when i'm done so you can delete the link.

darinchristensen commented 4 years ago

Thanks so much for looking into this! Curious if you've turned up any clues or workarounds.

mpadge commented 4 years ago

Okay, I can confirm that i can reproduce the error. Now to dig down to the cause ... might take a couple of days. In the meantime, I note that

length (which (duplicated (rds_shp$geometry)))
#> 16479

I'm not sure what happens with duplicated geometries like that, to be honest. OSM data when passed through the osmdata package have all official OSM ID values attached to all ways as well as all nodes, so identical geometries are ascertained by comparing ID values. In their absence (like in your data), "identical" is simply numerically identical, but I'm not sure what happens with those. That could be part of what's going on here. I'll dig some more and report back.

mpadge commented 4 years ago

That commit should fix things - please give a try, and re-open if it does not work. The problem was indeed related to using numeric (coordinate) values to determine unique vertices. This relied on the base::merge() function but using paste-ed numeric values to generate character names for coordinate pairs, which can generate unpredictable results. The above push implements a new C++ version of the same, but converting coordinates to strings via a fixed precision. This should be fail-safe.


Unrelated other thoughts: Many roads in the area you're looking at are unpaved. Standard routing using distances will not account for effects of surfaces, but dodgr also allows time-based routing which does take such things into account. To use that, you have to either re-generate the initial data set via osmdata::osmdata_sc() -> dodgr::weight_streetnet() (which will automatically generate appropriate surface-dependent speeds), or look at weighting_profiles$surface_speeds to get the values and either manually construct time and time_weighted columns, or adjust the d_weighted column to reflect maximal speeds on unpaved surfaces. In case that helps.

darinchristensen commented 4 years ago

It works! Thanks so much for your help with this!

Thanks also for the note about unpaved roads and your suggestions on that front.

mpadge commented 4 years ago

Thanks for uncovering a pretty important bug. All of my personal dodgr use is always via osmdata, so i don't personally get to test cases like this. Input like yours is particularly helpful to ensure the package works for other cases too.