geoarrow / geoarrow-r

Extension types for geospatial data for use with 'Arrow'
http://geoarrow.org/geoarrow-r/
Apache License 2.0
155 stars 6 forks source link

Error when loading the example - missing Z values #27

Open andrewmaclachlan opened 1 year ago

andrewmaclachlan commented 1 year ago

Hi there,

When i try and load a geoparquet (including the example) i get the following error. I think this is to do with the Z dimension as when i tried it with a geoparquet with a z dimension it loaded fine.

nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf")) write_geoparquet(nc, "nc.parquet") read_geoparquet_sf("nc.parquet")

error Error in geoarrow_schema_wkb(name = schema$name, format = schema$format, : startsWith(format, "w:") || isTRUE(format %in% c("z", "Z")) is not TRUE

paleolimbot commented 1 year ago

Than you for reporting! I've been neglecting the maintenance here while I work on solidifying the lower-level pieces.

I think 'Z' here refers to the column type (it's the format code for 'binary', although I can't seem to reproduce:

library(geoarrow)
write_geoparquet(
  sf::read_sf(system.file("shape/nc.shp", package = "sf")),
  "nc.parquet"
)
read_geoparquet_sf("nc.parquet")
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27
#> # A tibble: 100 × 15
#>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>    <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1      10
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0      10
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5     208
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1     123
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9    1066
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7     954
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0     115
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0     254
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4     748
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1     160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> #   geometry <MULTIPOLYGON [°]>

read_geoparquet("nc.parquet")
#> # A tibble: 100 × 15
#>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>    <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1      10
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0      10
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5     208
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1     123
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9    1066
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7     954
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0     115
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0     254
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4     748
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1     160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> #   geometry <grrw_wkb>

Created on 2023-04-25 with reprex v2.0.2

lcgodoy commented 1 year ago

Just reporting that I faced the same error with a different dataset. I also had the same error when running the same example as @andrewmaclachlan .

My session info is below for your reference:

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-12

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10         compiler_4.2.2      pillar_1.8.1       
 [4] prettyunits_1.1.1   remotes_2.4.2       class_7.3-20       
 [7] tools_4.2.2         pkgbuild_1.4.0      bit_4.0.5          
[10] jsonlite_1.8.4      tibble_3.1.8        lifecycle_1.0.3    
[13] pkgconfig_2.0.3     rlang_1.0.6         geoarrow_0.0.0.9000
[16] cli_3.6.0           DBI_1.1.3           curl_5.0.0         
[19] e1071_1.7-13        withr_2.5.0         dplyr_1.1.0        
[22] desc_1.4.2          generics_0.1.3      vctrs_0.5.2        
[25] classInt_0.4-9      rprojroot_2.0.3     bit64_4.0.5        
[28] grid_4.2.2          tidyselect_1.2.0    glue_1.6.2         
[31] R6_2.5.1            narrow_0.0.0.9000   processx_3.8.0     
[34] fansi_1.0.4         callr_3.7.3         purrr_1.0.1        
[37] magrittr_2.0.3      ps_1.7.2            units_0.8-2        
[40] assertthat_0.2.1    arrow_12.0.0        utf8_1.2.3         
[43] KernSmooth_2.23-20  proxy_0.4-27        wk_0.7.2           
[46] crayon_1.5.2       

I'm happy to help with further debugging.

h-a-graham commented 1 year ago

Okay, so I was running into the same issue and it seems that the issue (for me atleast) actually occurs when {geoarrow} is installed before {arrow} in the same session... After restarting the rsession (following installing arrow) all was well and works as expected.

paleolimbot commented 1 year ago

I have no idea why that would help!

I tried again to replicate (see below)...my focus is still on building a rock-solid foundation in C (mostly done) that will serve as the basis for a rewrite of this package.

library(geoarrow)

write_geoparquet(
  sf::read_sf(system.file("shape/nc.shp", package = "sf")),
  "nc.parquet"
)

read_geoparquet_sf("nc.parquet")
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27
#> # A tibble: 100 × 15
#>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>    <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1      10
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0      10
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5     208
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1     123
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9    1066
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7     954
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0     115
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0     254
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4     748
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1     160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> #   geometry <MULTIPOLYGON [°]>
read_geoparquet("nc.parquet")
#> # A tibble: 100 × 15
#>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>    <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1      10
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0      10
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5     208
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1     123
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9    1066
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7     954
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0     115
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0     254
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4     748
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1     160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> #   geometry <grrw_wkb>

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31)
#>  os       macOS Monterey 12.5
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Halifax
#>  date     2023-05-22
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  arrow         12.0.0     2023-05-05 [1] CRAN (R 4.2.0)
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  bit           4.0.5      2022-11-15 [1] CRAN (R 4.2.0)
#>  bit64         4.0.5      2020-08-30 [1] CRAN (R 4.2.0)
#>  class         7.3-20     2022-01-16 [2] CRAN (R 4.2.2)
#>  classInt      0.4-8      2022-09-29 [1] CRAN (R 4.2.0)
#>  cli           3.6.1      2023-03-23 [1] CRAN (R 4.2.0)
#>  DBI           1.1.3.9003 2022-10-31 [1] Github (r-dbi/DBI@a30e771)
#>  digest        0.6.31     2022-12-11 [1] CRAN (R 4.2.0)
#>  dplyr         1.1.0.9000 2023-01-31 [1] Github (tidyverse/dplyr@129c3ad)
#>  e1071         1.7-11     2022-06-07 [1] CRAN (R 4.2.0)
#>  evaluate      0.20       2023-01-17 [1] CRAN (R 4.2.0)
#>  fansi         1.0.4      2023-01-22 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.6.1      2023-02-06 [1] CRAN (R 4.2.0)
#>  generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  geoarrow    * 0.0.0.9000 2023-05-22 [1] Github (paleolimbot/geoarrow@feebf96)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.10       2022-12-22 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.4      2022-12-07 [1] CRAN (R 4.2.0)
#>  jsonlite      1.8.4      2022-12-06 [1] CRAN (R 4.2.0)
#>  KernSmooth    2.23-20    2021-05-03 [2] CRAN (R 4.2.2)
#>  knitr         1.41       2022-11-18 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  narrow        0.0.0.9000 2023-04-26 [1] Github (paleolimbot/narrow@3572fb5)
#>  pillar        1.9.0      2023-03-22 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  proxy         0.4-27     2022-06-09 [1] CRAN (R 4.2.0)
#>  purrr         1.0.1      2023-01-10 [1] CRAN (R 4.2.0)
#>  R.cache       0.16.0     2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.0     2022-06-28 [1] CRAN (R 4.2.0)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp          1.0.10     2023-01-22 [1] CRAN (R 4.2.0)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang         1.1.0      2023-03-14 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.19       2022-12-15 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  sf            1.0-12     2023-03-19 [1] CRAN (R 4.2.0)
#>  stringi       1.7.12     2023-01-11 [1] CRAN (R 4.2.0)
#>  stringr       1.5.0      2022-12-02 [1] CRAN (R 4.2.0)
#>  styler        1.8.1      2022-11-07 [1] CRAN (R 4.2.0)
#>  tibble        3.2.1      2023-03-20 [1] CRAN (R 4.2.0)
#>  tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
#>  units         0.8-0      2022-02-05 [1] CRAN (R 4.2.0)
#>  utf8          1.2.3      2023-01-31 [1] CRAN (R 4.2.0)
#>  vctrs         0.6.1      2023-03-22 [1] CRAN (R 4.2.0)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  wk            0.7.3      2023-05-06 [1] CRAN (R 4.2.0)
#>  xfun          0.36       2022-12-21 [1] CRAN (R 4.2.0)
#>  yaml          2.3.6      2022-10-18 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/deweydunnington/Library/R/arm64/4.2/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-05-22 with reprex v2.0.2

yeelauren commented 1 year ago

Just had the same issue as above, installed geoarrow first, then arrow. restarting R fixed this for me as well.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] sf_1.0-14           geoarrow_0.0.0.9000 dplyr_1.1.2        

loaded via a namespace (and not attached):
 [1] vctrs_0.6.2        cli_3.6.1          knitr_1.43         rlang_1.1.1        xfun_0.39          DBI_1.1.3          KernSmooth_2.23-22
 [8] purrr_1.0.1        renv_0.17.3        generics_0.1.3     assertthat_0.2.1   jsonlite_1.8.4     bit_4.0.5          glue_1.6.2        
[15] e1071_1.7-13       fansi_1.0.4        grid_4.3.1         classInt_0.4-9     tibble_3.2.1       yaml_2.3.7         lifecycle_1.0.3   
[22] compiler_4.3.1     Rcpp_1.0.11        pkgconfig_2.0.3    narrow_0.0.0.9000  rstudioapi_0.14    wk_0.7.3           R6_2.5.1          
[29] class_7.3-22       tidyselect_1.2.0   utf8_1.2.3         pillar_1.9.0       magrittr_2.0.3     bit64_4.0.5        tools_4.3.1       
[36] proxy_0.4-27       arrow_12.0.1.1     units_0.8-3