dcooley / sfheaders

Build sf objects from R and Rcpp
https://dcooley.github.io/sfheaders/
Other
74 stars 5 forks source link

Round trip sf -> df -> sf not complete and differences between sf and sfheader objects after trying round trip #92

Closed Robinlovelace closed 2 years ago

Robinlovelace commented 2 years ago

This maybe a non-issue. Opening it here in case of use/interest from a developer perspective.

Note: sfheaders does substantially better than sf when it comes to round trip but seems there are some differences, as shown in the output of the reprex below.

# Aim: compare sf vs sfheaders in terms of speed

library(spData)
library(sf)

# proof of concept
world_df = sf::st_coordinates(world)
head(world_df)
summary(world_df) # what does each mean?
# world_df_split = split(world_df, world_df[, "L1"]) # how to reassemble?

world_df_sfh = sfheaders::sf_to_df(world)
head(world_df_sfh)
world_sfh = sfheaders::sf_multipolygon(world_df_sfh)
world_sfh = sfheaders::sf_multipolygon(world_df_sfh, x = "x", y = "y", polygon_id = "", multipolygon_id = "multipolygon_id")
length(world$geom)
length(world_sfh$geometry)
waldo::compare(world$geom[1], world_sfh$geometry[1]) # how to get the same object back?
world$geom[1][[1]][[2]]
world_sfh$geom[1][[1]][[2]]
plot(world$geom[1])
plot(world_sfh$geometry[1])
Robinlovelace commented 2 years ago
``` r # Aim: compare sf vs sfheaders in terms of speed library(spData) library(sf) #> Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE # proof of concept world_df = sf::st_coordinates(world) head(world_df) #> X Y L1 L2 L3 #> [1,] -180.0000 -16.55522 1 1 1 #> [2,] -179.9174 -16.50178 1 1 1 #> [3,] -179.7933 -16.02088 1 1 1 #> [4,] -180.0000 -16.06713 1 1 1 #> [5,] -180.0000 -16.55522 1 1 1 #> [6,] 178.1256 -17.50481 1 2 1 summary(world_df) # what does each mean? #> X Y L1 L2 #> Min. :-180.00 Min. :-89.90 Min. :1.000 Min. : 1.000 #> 1st Qu.: -62.73 1st Qu.: -0.95 1st Qu.:1.000 1st Qu.: 1.000 #> Median : 19.90 Median : 21.80 Median :1.000 Median : 1.000 #> Mean : 11.39 Mean : 18.57 Mean :1.001 Mean : 2.816 #> 3rd Qu.: 62.91 3rd Qu.: 46.13 3rd Qu.:1.000 3rd Qu.: 2.000 #> Max. : 180.00 Max. : 83.65 Max. :2.000 Max. :30.000 #> L3 #> Min. : 1.00 #> 1st Qu.: 19.00 #> Median : 62.00 #> Mean : 73.85 #> 3rd Qu.:133.00 #> Max. :177.00 # world_df_split = split(world_df, world_df[, "L1"]) # how to reassemble? world_df_sfh = sfheaders::sf_to_df(world) head(world_df_sfh) #> sfg_id multipolygon_id polygon_id linestring_id x y #> 1 1 1 1 1 -180.0000 -16.55522 #> 2 1 1 1 1 -179.9174 -16.50178 #> 3 1 1 1 1 -179.7933 -16.02088 #> 4 1 1 1 1 -180.0000 -16.06713 #> 5 1 1 1 1 -180.0000 -16.55522 #> 6 1 1 2 1 178.1256 -17.50481 world_sfh = sfheaders::sf_multipolygon(world_df_sfh) #> Error in rcpp_to_sf(obj, geometry_columns, NULL, linestring_id, NULL, : sfheaders - can't work out the dimension world_sfh = sfheaders::sf_multipolygon(world_df_sfh, x = "x", y = "y", polygon_id = "", multipolygon_id = "multipolygon_id") length(world$geom) #> [1] 177 length(world_sfh$geometry) #> [1] 177 waldo::compare(world$geom[1], world_sfh$geometry[1]) # how to get the same object back? #> `attr(old, 'crs')$input`: "EPSG:4326" #> `attr(new, 'crs')$input`: NA #> #> lines(attr(old, 'crs')$wkt) vs lines(attr(new, 'crs')$wkt) #> - "GEOGCRS[\"WGS 84\"," #> - " DATUM[\"World Geodetic System 1984\"," #> - " ELLIPSOID[\"WGS 84\",6378137,298.257223563," #> - " LENGTHUNIT[\"metre\",1]]]," #> - " PRIMEM[\"Greenwich\",0," #> - " ANGLEUNIT[\"degree\",0.0174532925199433]]," #> - " CS[ellipsoidal,2]," #> - " AXIS[\"geodetic latitude (Lat)\",north," #> - " ORDER[1]," #> - " ANGLEUNIT[\"degree\",0.0174532925199433]]," #> - " AXIS[\"geodetic longitude (Lon)\",east," #> - " ORDER[2]," #> - " ANGLEUNIT[\"degree\",0.0174532925199433]]," #> - " USAGE[" #> - " SCOPE[\"Horizontal component of 3D system.\"]," #> - " AREA[\"World.\"]," #> - " BBOX[-90,-180,90,180]]," #> - " ID[\"EPSG\",4326]]" #> + NA #> #> `old[[1]]` is length 3 #> `new[[1]]` is length 1 #> #> `dim(old[[1]][[1]][[1]])`: 5 2 #> `dim(new[[1]][[1]][[1]])`: 23 2 #> #> old[[1]][[1]][[1]] | new[[1]][[1]][[1]] #> [3] -179.793320109049 | -179.793320109049 [3] #> [4] -180 | -180 [4] #> [5] -180 | -180 [5] #> [6] -16.5552165666392 - 178.12557 [6] #> [7] -16.5017831356494 - 177.67087 [7] #> [8] -16.0208822567412 - 177.28504 [8] #> [9] -16.0671326636425 - 177.38146 [9] #> [10] -16.5552165666392 - 177.93266 [10] #> - 178.55271 [11] #> - 178.71806 [12] #> ... ... ... and 34 more ... #> #> `old[[1]][[2]]` is a list #> `new[[1]][[2]]` is absent #> #> `old[[1]][[3]]` is a list #> `new[[1]][[3]]` is absent world$geom[1][[1]][[2]] #> [[1]] #> [,1] [,2] #> [1,] 178.1256 -17.50481 #> [2,] 177.6709 -17.38114 #> [3,] 177.2850 -17.72465 #> [4,] 177.3815 -18.16432 #> [5,] 177.9327 -18.28799 #> [6,] 178.5527 -18.15059 #> [7,] 178.7181 -17.62846 #> [8,] 178.3736 -17.33992 #> [9,] 178.1256 -17.50481 world_sfh$geom[1][[1]][[2]] #> Error in world_sfh$geom[1][[1]][[2]]: subscript out of bounds plot(world$geom[1]) ``` ![](https://i.imgur.com/cr4KDuO.png) ``` r plot(world_sfh$geometry[1]) ``` ![](https://i.imgur.com/OLuJAoz.png) Created on 2022-02-01 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)
Robinlovelace commented 2 years ago

Another observation, I would expect this to work if the input had been created by sfheaders:

world_sfh = sfheaders::sf_multipolygon(world_df_sfh)

Reasonable?

thomaszwagerman commented 2 years ago

Came here for the same reason as @Robinlovelace, and seconding that I'd expect

library(spData)
world_df_sfh = sfheaders::sf_to_df(world)
sfheaders::sfc_multipolygon(world_df_sfh)

to work the same as:

library(spData)
world_df_sfh = sfheaders::sf_to_df(world)
sfheaders::sf_multipolygon(
  world_df_sfh,
  x = "x",
  y = "y",
  multipolygon_id = "multipolygon_id",
  polygon_id = "polygon_id",
  linestring_id = "linestring_id",
)

I'm guessing it is do with arguments being fed as a character? I'd be happy to do a draft PR, but with the r-cmd-check failing it's challenging to do testing (and my Rcpp knowledge is fairly limited).


Also additional workflow documentation and examples for sf -> df -> sfh I think would be useful, similar to sf_to_df().

If the workflow was clear, a new feature of sf_to_sfh() (or another name to reflect going from "regular" sf to "sfheader" sf) might be useful, to skip this step of manually converting from sf to df and back. As far as I can tell, the output of sf_to_df() is always the same, so can be used "under-the-hood" to take an sf and convert it to sfh straight away.

That would also remove any user errors that could be introduced in the manual process (for example, forgetting one of multipolygon_id, linestring_id etc. produces some funny results), and a direct function could also incorporate checks that length(sf) == length(sfh) which would help with the above.

Again happy to attempt a draft PR, but not sure where to start.

dcooley commented 2 years ago

There are helper functions

sf_pt()
sf_mpt()
sf_line()
sf_mline()
sf_poly()
sf_mpoly()

Which are designed to work on the output of sf_to_df() directly, e.g

library(spData)
world_df_sfh = sfheaders::sf_to_df(world)
sfheaders::sf_mpoly(world_df_sfh)

But it's by design that the more general functions (sf_point(), sf_linestring(), etc...) require the user to specify the columns.


dcooley commented 2 years ago

@thomaszwagerman

or another name to reflect going from "regular" sf to "sfheader" sf

I'm not entirely sure I follow the logic here. There's not really a sfh object, so the concept of going to "sfheader" sf doesn't make much sense.