dwoll / shotGroups

R package to analyze shot group data: shape, precision, and accuracy
GNU General Public License v2.0
7 stars 1 forks source link

Slow for `getMinBBox()` #9

Closed harryprince closed 2 years ago

harryprince commented 2 years ago

I am using getMinBBox for get implement st_orientedevelope function like this, and I find it is slower than python counterpart, does any idea for the solution?

p =  "POLYGON((116.4821384 39.9836878,116.4827035 39.9841856,116.4827233 39.9841873,116.4827841 39.9841542,116.4830736 39.9839845,116.4831764 39.9839125,116.4831793 39.9838734,116.4830778 39.9837773,116.4820606 39.9828711,116.4788117 39.9799144,116.4787834 39.9798948,116.478763 39.9798933,116.4784785 39.9800747,116.4783037 39.980195,116.4783008 39.9802338,116.4821384 39.9836878))"

microbenchmark::microbenchmark(
 t1 <-  p %>%
  sf::st_as_sfc() %>%
  sf::st_coordinates() %>%
    data.frame() %>%
  shotGroups::getMinBBox() %>% with(pts) %>% 
  data.frame() %>%
  sf::st_as_sf(coords=c("x","y")) %>% 
  summarise(do_union=F) %>%
  sf::st_cast("POLYGON")
  ,

 t2 <-  reticulate::py$shapely$wkt$loads(p)$minimum_rotated_rectangle$wkt %>%
   sf::st_as_sfc() 
  )

lwgeom::st_astext(t1$geometry) == lwgeom::st_astext(t2)
method min lq mean median uq max neval cld
 t1 5.594826 5.878546 7.376063 6.106935 6.809007 16.335442 100 b
 t2 3.052761 3.268867 3.901166 3.396058 3.706425 8.500606 100 a
dwoll commented 2 years ago

Thanks for your interest in shotGroups! It could be that getMinBBox() can be made faster. However, you can already save time by avoiding conversion between matrix and data frame. As I don't have a Python installation, I only compare two versions using getMinBBox() I don't know much about sf, the second function returns a POLYGON, not a "Simple feature collection".

p <- "POLYGON((116.4821384 39.9836878,116.4827035 39.9841856,116.4827233 39.9841873,116.4827841 39.9841542,116.4830736 39.9839845,116.4831764 39.9839125,116.4831793 39.9838734,116.4830778 39.9837773,116.4820606 39.9828711,116.4788117 39.9799144,116.4787834 39.9798948,116.478763 39.9798933,116.4784785 39.9800747,116.4783037 39.980195,116.4783008 39.9802338,116.4821384 39.9836878))"

do_shotgroups1 <- function(x) {
    x %>%
        sf::st_as_sfc() %>%
        sf::st_coordinates() %>%
        data.frame() %>%
        shotGroups::getMinBBox() %>%
        with(pts) %>% 
        data.frame() %>%
        sf::st_as_sf(coords=c("x", "y")) %>% 
        summarise(do_union=FALSE) %>%
        sf::st_cast("POLYGON")
}

do_shotgroups2 <- function(x) {
    pts <- x %>%
        sf::st_as_sfc() %>%
        sf::st_coordinates()

    bb <- shotGroups::getMinBBox(pts[ , 1:2])["pts"]
    bb[["pts"]] <- rbind(bb[["pts"]], bb[["pts"]][1, ])
    sf::st_polygon(bb)
}

microbenchmark::microbenchmark(t1 <- do_shotgroups1(p),
                               t2 <- do_shotgroups2(p))
expr min lq mean median uq max neval cld
t1 <- do_shotgroups1(p) 4.9574 5.22285 6.704090 5.51855 7.70995 22.5991 100 b
t2 <- do_shotgroups2(p) 1.1606 1.23370 1.522725 1.31400 1.48030 5.1822 100 a
harryprince commented 2 years ago

thanks, it seems the R version is faster than the Python counterpart.