Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.6k stars 979 forks source link

rbindlist() with list of sf objects doesn't combine different geometries #5486

Open Cdevenish opened 2 years ago

Cdevenish commented 2 years ago

I've check the NEWS/issues/stack overflow but not seen any mentions of rbindlist() in connection with this issue.

I want to make use of data.table's amazing speed :) in combining sf dataframes as in this example here, issue #798

However, when different geometries are involved, an error is returned on different class attributes of the geometry column in the sf dataframe. Here is a simple example:

library(sf)

# create a sf data frame with different geometries 
st <- st_as_sf(data.frame(
  id = 1:2,
  geometry = st_as_sfc(c("LINESTRING(10 5, 9 4, 8 3, 7 2, 6 1)",
                         "MULTILINESTRING ((0 3, 0 4, 1 5, 2 5), (0.2 3, 0.2 4, 1 4.8, 2 4.8), (0 4.4, 0.6 5))"))))
st
# split into list
st.split <- split(st, ~id)

# attempt to recombine with rbindlist() - error
st2.dt <- data.table::rbindlist(st.split)

# Error in data.table::rbindlist(st.split) : 
 #  Class attribute on column 2 of item 2 does not match with column 2 of item 1.

# rbind does work:
st2.rbind <- do.call(rbind, st.split)
st2.rbind

# Is data table checking these classes?
lapply(st.split, function(x) class(x$geometry))

A workaround can be to use st_cast() to unify geometries (eg to LINESTRING here, or more generally GEOMETRYCOLLECTION) but this is not alway practical. Can this check on classes be changed for sf objects so that different geometries can be combined?

# Output of sessionInfo()

R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] sf_1.0-8

loaded via a namespace (and not attached): [1] Rcpp_1.0.9 rstudioapi_0.14 magrittr_2.0.3 units_0.8-0 tidyselect_1.1.2 R6_2.5.1 rlang_1.0.5
[8] fansi_1.0.3 dplyr_1.0.9 tools_4.2.1 grid_4.2.1 compare_0.2-6 data.table_1.14.2 KernSmooth_2.23-20 [15] utf8_1.2.2 cli_3.3.0 e1071_1.7-11 DBI_1.1.3 class_7.3-20 assertthat_0.2.1 tibble_3.1.8
[22] lifecycle_1.0.1 purrr_0.3.4 vctrs_0.4.1 glue_1.6.2 proxy_0.4-27 compiler_4.2.1 pillar_1.8.1
[29] generics_0.1.3 classInt_0.4-7 pkgconfig_2.0.3

ben-schwen commented 2 years ago

This looks like a duplicate to #3911

Cdevenish commented 2 years ago

This looks like a duplicate to #3911

Yes, it's the same issue with a different use case. I'd favour an ignore class / attributes option as well with rbindlist()

ben-schwen commented 10 months ago

Actually here comes another thing into play:

sf uses attributes and rbindlist does not retain them so it is also related to #5569