Closed FlxPo closed 2 years ago
@FlxPo All implemented now, via dodgr_save_streetnet
and dodgr_load_streetnet
functions. The following shows an example using a relatively very large network (~1.5 million edges):
library (dodgr)
f0 <- "/<path>/<to>/berlin-sf.Rds"
s0 <- format (round (file.size (f0) / 1000), big.mark = ",")
paste0 ("original file size = ", s0, "MB")
#> [1] "original file size = 22,024MB"
x <- readRDS (f0)
system.time (net <- weight_streetnet (x, wt_profile = "foot"))
#> user system elapsed
#> 71.811 0.945 72.920
format (nrow (net), big.mark = ",")
#> [1] "1,334,150"
# Wait until contracted graph has been processed in background:
px <- attr (net, "px")
while (px$is_alive ())
Sys.sleep (1)
f <- file.path (tempdir (), "streetnet.Rds")
dodgr_save_streetnet (net, f)
clear_dodgr_cache () # Remove all formerly cached objects
paste0 ("file size = ", format (round (file.size (f) / 1000), big.mark = ","), "MB")
#> [1] "file size = 236,870MB"
system.time (net <- dodgr_load_streetnet (f))
#> user system elapsed
#> 38.908 0.200 39.148
Created on 2021-10-26 by the reprex package (v2.0.1.9000)
Here's the same example applied to the small internal data set of the package (6,813 edges):
library (dodgr)
system.time (net <- weight_streetnet (hampi, wt_profile = "foot"))
#> user system elapsed
#> 0.260 0.000 0.263
# Wait until contracted graph has been processed in background:
px <- attr (net, "px")
while (px$is_alive ())
Sys.sleep (1)
f <- file.path (tempdir (), "streetnet.Rds")
dodgr_save_streetnet (net, f)
clear_dodgr_cache () # Remove all formerly cached objects
system.time (net <- dodgr_load_streetnet (f))
#> user system elapsed
#> 0.151 0.000 0.152
Created on 2021-10-26 by the reprex package (v2.0.1.9000)
Results are roughly comparable, and suggest that this whole save
-> load
workflow might speed up calculations by around two times, but not much more. Main reason is because the saved networks really are quite large (like ten times original data sizes!), and the loading also involves breaking up the saved object into components, and then individually re-saving each of those to the local tempdir()
. I still think it's a good idea to have this function, even if it ends up being less helpful (in speed terms) than what i might have anticipated. Thanks!
This is great, thank you ! I'll try to see how much time I save for my use cases.
For now I was just using saveRDS on a contracted and reduced graph (largest connected component only). When reloaded, I can use travel times and flow aggregation functions, and I got better speed ups than you did. But I guess I'm losing cached data and some of dodgr functionalities (uncontract graph for example) ?
I often use arrow and the parquet format for heavy read/write workflows and large dataframes. It might also be useful in this case ?
Yeah, i too thought about saving as parquet, which should definitely speed things up, but ... will leave that for another day. In the meantime, if you really just need the contracted graph and don't need to uncontract, then just saving the simple graph with saveRDS
would suffice, and would be much quicker. The only thing is if you require accurate routing with effects like turn angles and waiting times at intersections, then you should be using networks in sc
format (via dodgr_streetnet_sc()
), for which the vertices of the contracted network are then no longer simple vertices, so any subsequent analyses generally require uncontracting. Feel free to ping any time with further questions.
It might be useful to save the result of time consuming functions like weight_streetnet : run it once and for all, save, load the result when you need it for functions like dodgr_times or dodgr_flows_aggregate ?
saveRDS seems to work, but is there a better (faster) way ?