Open mdsumner opened 7 years ago
Package datastructures might help
Seems to:
## try using hash to insert the result as we go (or an env?)
hash_scan <- function(edges) {
seen <- rep(FALSE, nrow(edges))
v <- edges[[1]]
j <- 0
cycles <- list()
while(!all(seen)) {
f <- hashmap("character", "character")
f <- insert(f, edges[[1]][!seen], edges[[2]][!seen])
i <- 0
l <- list()
this <- start <- edges[[1]][!seen][1]
repeat({
i <- i + 1
this <- get(f, this)
l[[i]] <- this
#print(i)
if (this == start) break;
})
seen[v %in% unlist(l)] <- TRUE
j <- j + 1
cycles[[j]] <- unlist(l)
}
bind_rows(lapply(cycles, function(x) tibble(index = x)), .id = "path")
}
This should work as pointed out by @mpadge
ggm::fundCycles(igraph::as_adj(igraph::graph_from_data_frame(edges)))
But, as_adj
returns a sparse matrix and ggm wants an full matrix, so this would be prohibitive. Perhaps the ggm logic can be applied to the sparse graph?
@mdsumner i've actually got a find paths routine buried amongst lots of code here and here - crib away to your heart's content. My use-case is a series of user-specified segments hypothesised to contain a closed path, so I first find the longest path that can connect all segments, then find the shortest path through that sequence. You likely only want the latter, and that's the easy bit, and is mostly just the connect_highways()
function.
Nice! That's a good idea, I'll explore those. I'm messing around with Atlantis model geometry at the moment, it's a doubly-connected-edge-list, where the faces (segments) and the boxes (the polygons) are both first class entities, and they need to be interpreted with extension into depth independently as "ribbons" and "volumes", primarily for flux across faces (e.g. ocean current transport between boxes) as well as mass box properties (temperature, salinity). The exact same path/segment flip is required since the entities use a single "vertex pool", and this is directly relevant to animal tracking, integration with remote sensing and ocean/atmosphere models. At root it's a discrete vs. continuous distinction, basically requiring identity and structure at "sub-feature" levels. (QGIS calls this "geometry generators").
I hope together we can help identify some of these common patterns and get them appropriately elevated into the basic toolkit.
find_paths
has a reasonable stab at building paths from a set of segments. It's slow, but a doomed attempt at vectorization was exciting until it became clear that it worked only because the test set was actually ordered ... so, yeah.C++ would speed up "cycles()", but it's not really all that slow it's just icky. Test on a really big data set.
Finding paths from segments is good for "removing internal boundaries", that simple features doesn't like, and for being able to do segmenty things with 1D topologies.