dcooley / geometries

R package for creating and manipulating geometric data structures
https://dcooley.github.io/geometries/
Other
28 stars 2 forks source link

convert output of gm_geometries to simple feature #11

Closed pedro-andrade-inpe closed 3 years ago

pedro-andrade-inpe commented 3 years ago

Hello, and thank you very much for the nice package. I'm trying to convert a set of lines from data.frame to sf. I'm using sf::st_as_sfc() and then sf::st_as_sf() after calling gm_geometries() to get the same output of sfheaders::sf_linestring(). However, it is taking too much time. Do you know whether there is some way to work around using geometries package? Please see a reproductible code below.

x <- data.frame(x = 1:4e3, y = 1:4e3, id = paste(rep(1:4, each = 1e3)))

microbenchmark::microbenchmark(
  sfheaders = {
    v1 <- sfheaders::sf_linestring(obj = x, 
                                   x = 'x',
                                   y = 'y',
                                   linestring_id = 'id')
  },
  geometries = {  
    v2 <- geometries::gm_geometries(obj = x, 
                                    geometry_cols = c("x", "y"),
                                    id_cols = 'id',
                                    class_attributes = list(class = c("XY", "LINESTRING", "sfg"))
    )
  },
  geometries_sf = {  
    v2 <- geometries::gm_geometries(obj = x, 
                                    geometry_cols = c("x", "y"),
                                    id_cols = 'id',
                                    class_attributes = list(class = c("XY", "LINESTRING", "sfg"))
    )

    attr <- data.frame(id = x$id)
    attr$geom <- sf::st_as_sfc(v2)
    v3 <- sf::st_as_sf(attr)
  },
  times = 50
)

And my output:

Unit: microseconds
          expr     min      lq      mean   median      uq     max neval
     sfheaders   183.3   206.6   238.556   236.55   255.8   456.0    50
    geometries    89.9   100.9   124.682   117.40   132.8   249.7    50
 geometries_sf 32791.7 36467.8 40107.766 38218.50 41528.9 52596.1    50
dcooley commented 3 years ago

A couple of points

  1. I think your geometries_sf has a mistake, should be
   attr <- data.frame(id = unique( x$id) )

because othewise you get 4000 rows / lines, not 4.

  1. sfheaders uses geometries underneath, and all it's doing is adding the sf attributes onto the geometry structure. The only difference is it doesn't currently add the CRS object. So your geometries_sf code is almost replicating what sfheaders does.

Here's your code with the fix, and a larger example

library(sf)

n <- 4e5
x <- data.frame(x = 1:n, y = 1:n, id = paste(rep(1:4, each = (n / 4))))

microbenchmark::microbenchmark(
  sfheaders = {
    v1 <- sfheaders::sf_linestring(obj = x, 
                                   x = 'x',
                                   y = 'y',
                                   linestring_id = 'id')
  },
  geometries = {  
    v2 <- geometries::gm_geometries(obj = x, 
                                    geometry_cols = c("x", "y"),
                                    id_cols = 'id',
                                    class_attributes = list(class = c("XY", "LINESTRING", "sfg"))
    )
  },
  geometries_sf = {  
    v2 <- geometries::gm_geometries(obj = x, 
                                    geometry_cols = c("x", "y"),
                                    id_cols = 'id',
                                    class_attributes = list(class = c("XY", "LINESTRING", "sfg"))
    )

    attr <- data.frame(id = unique( x$id) )
    attr$geom <- sf::st_as_sfc(v2)
    v3 <- sf::st_as_sf(attr)
  },
  times = 50
)

# Unit: milliseconds
#          expr       min       lq     mean   median       uq      max neval
#     sfheaders 16.861981 24.03502 35.64669 26.35783 28.88044 132.1197    50
#    geometries  8.323943 12.16608 30.24712 16.21959 19.21363 129.9311    50
# geometries_sf 17.658255 23.05847 30.56037 26.20761 29.58738 135.6140    50
pedro-andrade-inpe commented 3 years ago

Sorry, I didn't note the big mistake. Many thanks for the help and for the explanation in point (2).