Ability to save the result of weight_streetnet to disk

FlxPo commented 2 years ago

It might be useful to save the result of time consuming functions like weight_streetnet : run it once and for all, save, load the result when you need it for functions like dodgr_times or dodgr_flows_aggregate ?

saveRDS seems to work, but is there a better (faster) way ?

mpadge commented 2 years ago

@FlxPo All implemented now, via dodgr_save_streetnet and dodgr_load_streetnet functions. The following shows an example using a relatively very large network (~1.5 million edges):

library (dodgr)
f0 <- "/<path>/<to>/berlin-sf.Rds"
s0 <- format (round (file.size (f0) / 1000), big.mark = ",")
paste0 ("original file size = ", s0, "MB")
#> [1] "original file size = 22,024MB"
x <- readRDS (f0)

system.time (net <- weight_streetnet (x, wt_profile = "foot"))
#>    user  system elapsed 
#>  71.811   0.945  72.920
format (nrow (net), big.mark = ",")
#> [1] "1,334,150"

# Wait until contracted graph has been processed in background:
px <- attr (net, "px")
while (px$is_alive ())
    Sys.sleep (1)

f <- file.path (tempdir (), "streetnet.Rds")
dodgr_save_streetnet (net, f)
clear_dodgr_cache () # Remove all formerly cached objects
paste0 ("file size = ", format (round (file.size (f) / 1000), big.mark = ","), "MB")
#> [1] "file size = 236,870MB"
system.time (net <- dodgr_load_streetnet (f))
#>    user  system elapsed 
#>  38.908   0.200  39.148

^{Created on 2021-10-26 by the reprex package (v2.0.1.9000)}

Here's the same example applied to the small internal data set of the package (6,813 edges):

library (dodgr)
system.time (net <- weight_streetnet (hampi, wt_profile = "foot"))
#>    user  system elapsed 
#>   0.260   0.000   0.263

# Wait until contracted graph has been processed in background:
px <- attr (net, "px")
while (px$is_alive ())
    Sys.sleep (1)

f <- file.path (tempdir (), "streetnet.Rds")
dodgr_save_streetnet (net, f)
clear_dodgr_cache () # Remove all formerly cached objects
system.time (net <- dodgr_load_streetnet (f))
#>    user  system elapsed 
#>   0.151   0.000   0.152

^{Created on 2021-10-26 by the reprex package (v2.0.1.9000)}

Results are roughly comparable, and suggest that this whole save -> load workflow might speed up calculations by around two times, but not much more. Main reason is because the saved networks really are quite large (like ten times original data sizes!), and the loading also involves breaking up the saved object into components, and then individually re-saving each of those to the local tempdir(). I still think it's a good idea to have this function, even if it ends up being less helpful (in speed terms) than what i might have anticipated. Thanks!

FlxPo commented 2 years ago

This is great, thank you ! I'll try to see how much time I save for my use cases.

For now I was just using saveRDS on a contracted and reduced graph (largest connected component only). When reloaded, I can use travel times and flow aggregation functions, and I got better speed ups than you did. But I guess I'm losing cached data and some of dodgr functionalities (uncontract graph for example) ?

I often use arrow and the parquet format for heavy read/write workflows and large dataframes. It might also be useful in this case ?

mpadge commented 2 years ago

Yeah, i too thought about saving as parquet, which should definitely speed things up, but ... will leave that for another day. In the meantime, if you really just need the contracted graph and don't need to uncontract, then just saving the simple graph with saveRDS would suffice, and would be much quicker. The only thing is if you require accurate routing with effects like turn angles and waiting times at intersections, then you should be using networks in sc format (via dodgr_streetnet_sc()), for which the vertices of the contracted network are then no longer simple vertices, so any subsequent analyses generally require uncontracting. Feel free to ping any time with further questions.

UrbanAnalyst / dodgr

Ability to save the result of weight_streetnet to disk #170