Closed mdsumner closed 4 years ago
Ugh, I mucked up a bit up there - having massive VM problems today, please ignore until I can sort it out - thanks!
Ugh, I mucked up a bit up there - having massive VM problems today, please ignore until I can sort it out - thanks!
sure, but you bring up a good point
Note to me: this subset happens every iteration for every sfg
- can I move it outside and call it once, similar to how I use get_sfc_coordinates()
Here is the grand reprex I was working on when my computer troubles forced the less satisfactory report above
#remotes::install_github("hypertidy/anglr")
## 1. worker function to convert mesh to df
# (a long tortured path, just so we have nice triangles )
to_sfh_df.mesh3d <- function(x, ...) {
idx <- if (!is.null(x$it)) x$it else x$ib
each <- nrow(idx) + 1 ## to close polygon for sf
## we need to close loops for sf
idx <- rbind(idx, idx[1, ])
nc <- dim(idx)[2L]
idx <- as.vector(idx)
out <- tibble::as_tibble(
cbind(x = x$vb[1L, idx],
y = x$vb[2L, idx],
z = x$vb[3L, idx],
linestring_id = rep(seq_len(nc), each = each),
polygon_id = rep(seq_len(nc), each = each)))
}
## 2. a worker function to demonstrate blazing vectorized sf-polygon creation
## for a simple case of very many features
mk_direct <- function(df) {
## we're assume only POLYGON, XY
## template the sfg and populate copies of it is much faster
template <- sfheaders::sf_polygon(data.frame(x = c(0, 0, 1, 1, 0), y = c(0, 1, 1, 0, 0),
linestring_id = 0, polygon_id = 0),
x = "x", y = "y",
linestring_id = "linestring_id",
polygon_id = "polygon_id")$geometry[[1]]
## this only works for POLYGON, and when they have no holes
## split the coordinates and populate copies of the template
sflist <- lapply(split(t(as.matrix(df[c("x", "y")])),
rep(df$linestring_id, each = 2)), ## because we have 2-columns x, y
function(ixy) {
template[[1L]] <- matrix(ixy, ncol = 2, byrow = TRUE)
template
})
## generate the final sf df
structure(list(geometry = structure(sflist, class = c("sfc_POLYGON", "sfc"),
precision = 0.0, bbox = structure(c(xmin = min(d2$x), ymin = min(df$y), xmax = max(df$x), ymax = max(df$y)), class = "bbox"),
crs = structure(list(epsg = NA_integer_,
proj4string = NA_character_),
class = "crs"), n_empty = 0L)),
class = c("sf", "data.frame"), sf_column = "geometry", agr = character(0L), row.names = seq_along(sflist))
}
## EXAMPLE DATA
exdata <- silicate::inlandwaters
## very few, complex polygons
d1 <- sfheaders::sf_to_df(exdata)
## very many, simple polygons (triangles), using
## obscure constrained triangulation to get a dense, high-quality mesh
d2 <- to_sfh_df.mesh3d(anglr::as.mesh3d(anglr::DEL(exdata, max_area = 1e10)))
## the usual situation, sfheaders is fast (very few features, many verts per feature)
system.time({sfheaders::sf_multipolygon(d1,
x = "x", y = "y",
linestring_id = "linestring_id",
polygon_id = "polygon_id",
multipolygon_id = "multipolygon_id")})
# user system elapsed
# 0.150 0.004 0.155
## a very specific situation, sfheaders is slow (very many features, all are triangles)
system.time({sfx <- sfheaders::sf_polygon(d2,
x = "x", y = "y",
linestring_id = "linestring_id",
polygon_id = "polygon_id")})
#> user system elapsed
#> 24.745 0.000 24.744
## we can beat sfheaders with R code
system.time({
res <- mk_direct(d2)
})
#> user system elapsed
#> 0.854 0.000 0.854
Created on 2020-03-14 by the reprex package (v0.3.0)
I'm pretty sure the key is templating the sfg and only creating it once, and while this example is only for triangles (or quads) the same copy the raw matrix into the template approach should work generally. (Though I still haven't explored the underlying code).
This is now working (and quick) in geometries
system.time({
gmx <- geometries::gm_make_geometries(
df = d2
, id_cols = c("polygon_id", "linestring_id")
, geometry_cols = c("x","y")
, class_attributes = c("XY","POLYGON","sfg")
)
})
user system elapsed
0.015 0.000 0.016
(This is only for making geometries, not handling the data attributes, so is actually comparible to sfheaders::sfc_polygon
)
Great!
@mdsumner I've now got this update almost ready for release. Would you mind checking out this branch before I merge it to master and testing it still works on some of your existing codes?
remotes::install_github("dcooley/geometries")
remotes::install_github("dcooley/sfheaders", ref = "issue82")
(no rush - will take a few days for me to get geometries
on CRAN)
well, it's FAST sheesh - no waiting whatsover, ramped it up so now the meshing is the slow part and it's less than a second to generate 325000 triangles from a df
That actually entirely shifts perspective it's amazing, thanks!
cool. If you don't need the sf
stuff you can also use {geometries}
directly
indeed, well the old quadmesh stuff is now the slow part - time for me to level up again ;0
I've found I can get faster results than
sf_polygon
just using R, by templating the firstsfg
and simply replicating that out for the remaining geometrys. I'm sorry about the example, it's just that my main VM is playing up and the proper code is not available atm. This example creates polygons from raster pixels, the code in spex::polygonize uses the template idea - but I get the same kind of results when I'm generating POLYGON from triangles.The code I use is not generalized to other geometry types or to XYZ, XYM, XYZM, and it's only done for POLYGON without holes - but I think it's worth exploring if sfheaders can be faster?
(It would be tricky to generalize the approach here to polygons with holes and multipolygons, I don't want to replace sfheaders with R but find out if sfheaders can be faster - perhaps this is a special case and it's the nesting that requires time, I'm not sure).
Is it possible that the sfg creation is the slow part, is that worth doing just once and replacing the coordinates in a copy at the C++ level? (I'll try exploring there)