UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
127 stars 16 forks source link

how to export dodgr aggregated flow and underlying network to sf/shapefile #194

Closed sriramab closed 1 year ago

sriramab commented 1 year ago

Hi, I have been exploring the dodgr package and I found it interesting for my work. I so far could not find good plotting options of the dodgr street network to my requirements, so I was trying to convert the dodgr street to sf dodgr::dodgr_to_sf . But when I do that, i am losing the flow column. Do I miss something? Is there any way to convert the dodgr street network "with" flow column to sf or a shapefile.

thank you

mpadge commented 1 year ago

Thanks @sriramab, that was indeed a bug that is fixed by the above commit. Proof:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.2.16.20'
graph <- weight_streetnet (hampi)
v <- dodgr_vertices (graph)
set.seed (1)
from <- sample (v$id, size = 10L)
to <- sample (v$id, size = 10L)
flows <- array (1, dim = c (length (from), length (to)))

graph <- dodgr_flows_aggregate (graph, from = from, to = to, flows = flows)
dodgr_to_sf (graph)
#> Simple feature collection with 760 features and 16 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 76.37261 ymin: 15.30104 xmax: 76.49232 ymax: 15.36033
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>    geom_num edge_id    from_id from_lon from_lat      to_id   to_lon   to_lat
#> 1        70 a129831 4474520379 76.48903 15.32825 1376769110 76.46860 15.32883
#> 2        39 a129820  977473118 76.46099 15.34150  977473148 76.46104 15.34023
#> 3        39 a129819  977473148 76.46104 15.34023  977473118 76.46099 15.34150
#> 4        35 a129818 1388482060 76.45873 15.35057 1388481878 76.45377 15.34816
#> 5        35 a129817 1388481878 76.45377 15.34816 1388482060 76.45873 15.35057
#> 6        82 a129814 2195424937 76.45879 15.35079 2195424962 76.45181 15.35161
#> 7        82 a129813 2195424962 76.45181 15.35161 2195424937 76.45879 15.35079
#> 8       157 a129810 1204772895 76.44296 15.34517 3921522971 76.43542 15.34657
#> 9       157 a129809 3921522971 76.43542 15.34657 1204772895 76.44296 15.34517
#> 10      170 a129806 5351820916 76.44245 15.32016 5974426282 76.44877 15.31723
#>            d d_weighted      highway    way_id       time time_weighted
#> 1  3560.4352  3956.0391         path 123463598 1068.13056     1186.8117
#> 2   140.9472   156.6080  residential  84012653   84.56832       93.9648
#> 3   140.9472   156.6080  residential  84012653   84.56832       93.9648
#> 4   621.7853   888.2647      primary  53658844  149.22847      213.1835
#> 5   621.7853   888.2647      primary  53658844  149.22847      213.1835
#> 6   945.6985  1050.7761 unclassified 209318354  226.96764      252.1863
#> 7   945.6985  1050.7761 unclassified 209318354  226.96764      252.1863
#> 8  1095.6466  1217.3851        track 388983220  328.69397      365.2155
#> 9  1095.6466  1217.3851        track 388983220  328.69397      365.2155
#> 10 1767.7898  1964.2109        track 554572318  465.17317      516.8591
#>         flow component                       geometry
#> 1  0.3933224         1 LINESTRING (76.48903 15.328...
#> 2  0.0000000         2 LINESTRING (76.46099 15.341...
#> 3  0.0000000         2 LINESTRING (76.46104 15.340...
#> 4  0.1742424         2 LINESTRING (76.45873 15.350...
#> 5  0.0000000         2 LINESTRING (76.45377 15.348...
#> 6  0.0000000         2 LINESTRING (76.45879 15.350...
#> 7  0.0000000         2 LINESTRING (76.45181 15.351...
#> 8  0.0000000         2 LINESTRING (76.44296 15.345...
#> 9  0.0000000         2 LINESTRING (76.43542 15.346...
#> 10 0.0000000         1 LINESTRING (76.44245 15.320...

Created on 2022-10-30 with reprex v2.0.2

... and the flow column is now there. I've closed the issue with that, but please feel free to comment if you have any further questions.


Plotting alternatives

Depending on your needs, it might help to know that i usually use something like this as the default plotting option for flows. It opens an interactive mapdeck visualisation with lines coloured and scaled by aggregated flows.

library (mapdeck)
mapdeck (style = mapdeck_style ()) %>%
    add_line (graph, 
              origin = c ("from_lon", "from_lat"),
              destination = c ("to_lon", "to_lat"),
              stroke_colour = "flow", stroke_width = "flow",
              stroke_opacity = "flow",
              palette = "matlab_like2", legend = TRUE)
sriramab commented 1 year ago

Thank you @mpadge . I am able to export the flow numbers to an sf object. I will explore the plotting alternatives.

sriramab commented 1 year ago

May I ask you to please revisit this ? When you export to sf, the flow values are different than what is in dodgr street network. In my case as below, I have initialized a zero matrix for all pairs and then changed just one pair to 100. When I aggregated the flows, I get 100 all along the path between the pair. So far so good. But as soon as I convert to sf, I was expecting the sf geometries to have a 100 also. All the graph and its edges have different flow values. From another discussion , I checked with both norm_sums F/T , but I do not get the 100 value on the edges.

sorry for too many questions, I just started with this package.

    try <- matrix(0L, nrow = 51, ncol = 51)
    try[1,2] <- 100 
    df <- dodgr_flows_aggregate (graph = my_graph, from = from, to = to, flows = try, norm_sums = F) 
    sf_f <- dodgr::dodgr_to_sf(df) 
    sf::st_write(sf_f, "flownetwork1001.shp", delete_layer = T ) 
mpadge commented 1 year ago

No worries about too many questions @sriramab, they all seem to be uncovering some bugs which obviously need fixing. The following reprex demonstrates that dodgr_to_sf() indeed drops the expected flow value:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.2.16.32'
graph <- weight_streetnet (hampi)
v <- dodgr_vertices (graph)
n <- nrow (v)
try <- matrix(0L, nrow = n, ncol = n)
try [2,3] <- 100 
from <- to <- v$id
df <- dodgr_flows_aggregate (graph = graph, from = from, to = to, flows = try, norm_sums = F) 
table (df$flow)
#> 
#>    0  100 
#> 6812    1
sf_f <- dodgr_to_sf (df) 
table (sf_f$flow)
#> 
#>   0 
#> 760

Created on 2022-11-02 with reprex v2.0.2

I'll re-open the issue to investigate further and hopefully fix soon.

mpadge commented 1 year ago

The :bug: comes from the graph contraction routines, and not the sf-conversion itself:

library (dodgr)
graph <- weight_streetnet (hampi)
v <- dodgr_vertices (graph)
n <- nrow (v)
try <- matrix(0L, nrow = n, ncol = n)
try [2,3] <- 100 
from <- to <- v$id
graph <- dodgr_flows_aggregate (graph = graph, from = from, to = to, flows = try, norm_sums = F) 
table (graph$flow)
#> 
#>    0  100 
#> 6812    1

gc <- dodgr_contract_graph (graph)
table (gc$flow)
#> 
#>   0 
#> 760

Created on 2022-11-02 with reprex v2.0.2

mpadge commented 1 year ago

@sriramab Should all be fixed now, and you should see something like this:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.2.16.38'
graph <- weight_streetnet (hampi)
v <- dodgr_vertices (graph)
n <- nrow (v)
try <- matrix(0L, nrow = n, ncol = n)
try [2,3] <- 100 
from <- to <- v$id
graph <- dodgr_flows_aggregate (graph = graph, from = from, to = to, flows = try, norm_sums = F) 
table (graph$flow)
#> 
#>    0  100 
#> 6812    1

gc <- dodgr_contract_graph (graph)
table (gc$flow)
#> 
#>   0 100 
#> 766   1

graph_sf <- dodgr_to_sf (graph) 
table (graph_sf$flow)
#> 
#>   0 100 
#> 766   1

Created on 2022-11-02 with reprex v2.0.2

Thanks so much for picking that up! I only use graph contraction routines as an initial entry point, for which everything works perfectly. But the dodgr_to_sf() function relies on calling graph contraction routines after additional data have been added to the graph. Becuase i never really use that, i never saw that the whole routine then simply failed to pick up the additional columns, and simply "contracted" them away, effectively removing the single row with flow values of 1. Let me know how this works for you now.

sriramab commented 1 year ago

Hi @mpadge , I think something is still not right. Yes, it gets a 100 value, but the path seems incorrect. I think something is not right in replacing the values in memory. For example, just pasting again the same code of mine from before:

#======RESHUFFLE BLOCK
# select columns
from <- latlong[,c("long", "lat")]

# shuffled_data
to2 <- latlong[sample(1:nrow(latlong)),]
# select columns
to <- to2[,c("long", "lat")]
#====== DODGR BLOCK
 try <- matrix(0L, nrow = 51, ncol = 51)
 try[1,2] <- 100 
 df <- dodgr_flows_aggregate (graph = my_graph, from = from, to = to, flows = try, norm_sums = F) 
 sf_f <- dodgr::dodgr_to_sf(df) 
 sf::st_write(sf_f, "flownetwork1001.shp", delete_layer = T ) 

Problems:

  1. If I do not remove df rm(df), the df is not updated with new flows for the next rerun of the aggregate function in DODGR block. (I reshuffle the points dataframe just to change the from and to points in the RESHUFFLE block.). I checked that the sample funciton does indeed shuffle the intermediate variable to2.
  2. Once I rm the df, and rerun the aggregate function, the df gets updated .
  3. Even if I rm df, and df is updated, the dodgr_to_sf , and the resulting sf_f never changes.

I have a feeling if there is some issue with environment and exchange between local environment and global environment of the functions, Maybe something somewhere is hardcoded not to replace if a variable exists ? I am not a programmer by profession, so just guessing.

Please see if something similar is reproducible on your side.

Thank you.

sriramab commented 1 year ago

please see, I have also removed sf_f for each rerun. But the edge_id on which flow is assigned after dodgr_to_sf, never changes

mpadge commented 1 year ago

Can you try to make that reproducible, so I can actually see what's going on? Use the internal "hampi" data of you can, or else upload some data somewhere and start with a download line. Thanks

sriramab commented 1 year ago

Please see, two different runs, two different to (destination point), but same result.

> print("========= next run======")
[1] "========= next run======"
> from <- latlong[,c("long", "lat")]
> 
> # shuffled_data= data[sample(1:nrow(data)), ]
> to2 <- latlong[sample(1:nrow(latlong)),]
> 
> to <- to2[,c("long", "lat")]
> 
> 
> 
> try <- matrix(0L, nrow = 51, ncol = 51)
> try[1,2] <- 100
> df <- dodgr_flows_aggregate (graph = metro_graph, from = from, to = to, flows = try, norm_sums = F)
> sf_f <- dodgr::dodgr_to_sf(df)
> #sf::st_write(sf_f, "flownetwork1001.shp", delete_layer = T )
> 
> from[1,]
      long      lat
1 77.57309 12.90751
> to[2,]
       long      lat
37 77.56576 12.97584
> df[df$flow==100, "edge_id"]
 [1]  54  58  62  66  70  74  78  82  86 149
> sf_f[sf_f$flow==100, "edge_id"]
Simple feature collection with 18 features and 1 field
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 77.51927 ymin: 12.90751 xmax: 77.58023 ymax: 12.97584
Geodetic CRS:  WGS 84
First 10 features:
    edge_id                       geometry
56       54 LINESTRING (77.57454 12.966...
60       58 LINESTRING (77.57483 12.961...
64       62 LINESTRING (77.57372 12.950...
68       66 LINESTRING (77.58004 12.946...
72       70 LINESTRING (77.5801 12.9381...
76       74 LINESTRING (77.58015 12.929...
80       78 LINESTRING (77.58023 12.921...
84       82 LINESTRING (77.57362 12.915...
88       86 LINESTRING (77.5731 12.9075...
151     149 LINESTRING (77.57322 12.975...
> rm(df)
> rm(sf_f)
> print("========= next run======")
[1] "========= next run======"
> from <- latlong[,c("long", "lat")]
> 
> # shuffled_data= data[sample(1:nrow(data)), ]
> to2 <- latlong[sample(1:nrow(latlong)),]
> 
> to <- to2[,c("long", "lat")]
> 
> 
> 
> try <- matrix(0L, nrow = 51, ncol = 51)
> try[1,2] <- 100
> df <- dodgr_flows_aggregate (graph = metro_graph, from = from, to = to, flows = try, norm_sums = F)
> sf_f <- dodgr::dodgr_to_sf(df)
> #sf::st_write(sf_f, "flownetwork1001.shp", delete_layer = T )
> 
> from[1,]
      long      lat
1 77.57309 12.90751
> to[2,]
       long      lat
15 77.58015 12.92958
> df[df$flow==100, "edge_id"]
[1] 78 82 86
> sf_f[sf_f$flow==100, "edge_id"]
Simple feature collection with 18 features and 1 field
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 77.51927 ymin: 12.90751 xmax: 77.58023 ymax: 12.97584
Geodetic CRS:  WGS 84
First 10 features:
    edge_id                       geometry
56       54 LINESTRING (77.57454 12.966...
60       58 LINESTRING (77.57483 12.961...
64       62 LINESTRING (77.57372 12.950...
68       66 LINESTRING (77.58004 12.946...
72       70 LINESTRING (77.5801 12.9381...
76       74 LINESTRING (77.58015 12.929...
80       78 LINESTRING (77.58023 12.921...
84       82 LINESTRING (77.57362 12.915...
88       86 LINESTRING (77.5731 12.9075...
151     149 LINESTRING (77.57322 12.975...
> rm(df)
> rm(sf_f)
sriramab commented 1 year ago

Sorry, just saw your message after pasting. I will add a reprex in my next message.

sriramab commented 1 year ago

Issue_submit.zip

Please find data and code in the zip file

mpadge commented 1 year ago

Thanks, i can reproduce that, so will re-open to fix again.

mpadge commented 1 year ago

@sriramab There were indeed two additional bugs in the code. The first was the absence of full pre-processing of the from and to parameters for the flow functions. The functions really only accepted vectors of ID values, not matrices or data.frames of coordinates. That has now been fixed.

The second actual bug was that the sf conversion always used a formerly cached version of the graph, so would always return the previous results, as you saw. dodgr uses a rather complicated caching system which enables instantaneous retrieval of contracted versions of graphs. One work-around would have been to use clear_dodgr_cache(), which indeed generated the expected, updated result. But the bug itself was in this fairly easy to fix, so with the above commit you should now see something like this:

library(dodgr)
packageVersion ("dodgr")
#> [1] '0.2.16.42'

load("debug_data.RData")
n <- nrow (sample_points)

print("========= run 1 ======")
#> [1] "========= run 1 ======"
from <- sample_points[,c("long", "lat")]

# shuffled_data
set.seed (1L)
to <- sample_points[sample(seq_len(n)), c("long", "lat")]

try <- matrix(0L, nrow = 51, ncol = 51)
try[1,2] <- 100

df <- dodgr_flows_aggregate (graph = sample_graph, from = from, to = to, flows = try, norm_sums = F)
table (df$flow)
#> 
#>   0 100 
#> 196  14
sf_f <- dodgr::dodgr_to_sf(df)
#> Loading required namespace: sf
table (sf_f$flow)
#> 
#>   0 100 
#> 192  14

print("========= run 2 ======")
#> [1] "========= run 2 ======"
set.seed (3L)
to <- sample_points[sample(seq_len(n)), c("long", "lat")]

df <- dodgr_flows_aggregate (graph = sample_graph, from = from, to = to, flows = try, norm_sums = F)
table (df$flow)
#> 
#>   0 100 
#> 199  11
sf_f <- dodgr::dodgr_to_sf(df)
table (sf_f$flow)
#> 
#>   0 100 
#> 195  11

Created on 2022-11-03 with reprex v2.0.2

Note also that the "edge_id" values of the sf objects won't necessarily map back on to "edge_id" values of the input graph, and should just be considered arbitrary identifiers not related to the original values.

sriramab commented 1 year ago

Thank you @mpadge , this seems to be working. And your response/ fix on accepting matrices and dataframe most possibly also address my other issue. I will read into it further.

mpadge commented 1 year ago

@sriramab FYI new version with these updates now on CRAN, and you're auto-acknowledged as contributor to this release: https://github.com/ATFutures/dodgr/releases/tag/v0.2.17 Happy to close this now? Thanks!

sriramab commented 1 year ago

Thank you. Slightly related. what would you advise to plot the resulting line layer with aggregated flows such that the size of the line is proportaionally scaled to the flow value "to one side" of the line as applicable as if showing the flow in each direction of the road. I am currently doing st_buffer on the sf object.

mpadge commented 1 year ago

I have not yet seen any practical ways to visualise bi-directional flows. It always ends up looking a complete mess, which is why i wrote the merge_directed_graph() function. I'd love to hear if you found a way to do that, but from what I know, it's just not possible in any sensible way (meaning not possible for networks of the typical size of entire cities).

... and with that, i'll close this issue.

sriramab commented 1 year ago

I got something like this. But i need something for user to zoom in when required to read the values. I tried plotly but plotly still doesnt work with geom sf labels it seems image

mpadge commented 1 year ago

Not bad! An interactive one with zoom ability needs someone to develop a js library with a force model to keep the lines apart, and further apart for thicker lines. I'm sure it'll happen one day :crossed_fingers: