Open andrewmaclachlan opened 1 year ago
Than you for reporting! I've been neglecting the maintenance here while I work on solidifying the lower-level pieces.
I think 'Z' here refers to the column type (it's the format code for 'binary', although I can't seem to reproduce:
library(geoarrow)
write_geoparquet(
sf::read_sf(system.file("shape/nc.shp", package = "sf")),
"nc.parquet"
)
read_geoparquet_sf("nc.parquet")
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 100 × 15
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <MULTIPOLYGON [°]>
read_geoparquet("nc.parquet")
#> # A tibble: 100 × 15
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <grrw_wkb>
Created on 2023-04-25 with reprex v2.0.2
Just reporting that I faced the same error with a different dataset. I also had the same error when running the same example as @andrewmaclachlan .
My session info is below for your reference:
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.3.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sf_1.0-12
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 compiler_4.2.2 pillar_1.8.1
[4] prettyunits_1.1.1 remotes_2.4.2 class_7.3-20
[7] tools_4.2.2 pkgbuild_1.4.0 bit_4.0.5
[10] jsonlite_1.8.4 tibble_3.1.8 lifecycle_1.0.3
[13] pkgconfig_2.0.3 rlang_1.0.6 geoarrow_0.0.0.9000
[16] cli_3.6.0 DBI_1.1.3 curl_5.0.0
[19] e1071_1.7-13 withr_2.5.0 dplyr_1.1.0
[22] desc_1.4.2 generics_0.1.3 vctrs_0.5.2
[25] classInt_0.4-9 rprojroot_2.0.3 bit64_4.0.5
[28] grid_4.2.2 tidyselect_1.2.0 glue_1.6.2
[31] R6_2.5.1 narrow_0.0.0.9000 processx_3.8.0
[34] fansi_1.0.4 callr_3.7.3 purrr_1.0.1
[37] magrittr_2.0.3 ps_1.7.2 units_0.8-2
[40] assertthat_0.2.1 arrow_12.0.0 utf8_1.2.3
[43] KernSmooth_2.23-20 proxy_0.4-27 wk_0.7.2
[46] crayon_1.5.2
I'm happy to help with further debugging.
Okay, so I was running into the same issue and it seems that the issue (for me atleast) actually occurs when {geoarrow} is installed before {arrow} in the same session... After restarting the rsession (following installing arrow) all was well and works as expected.
I have no idea why that would help!
I tried again to replicate (see below)...my focus is still on building a rock-solid foundation in C (mostly done) that will serve as the basis for a rewrite of this package.
library(geoarrow)
write_geoparquet(
sf::read_sf(system.file("shape/nc.shp", package = "sf")),
"nc.parquet"
)
read_geoparquet_sf("nc.parquet")
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 100 × 15
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <MULTIPOLYGON [°]>
read_geoparquet("nc.parquet")
#> # A tibble: 100 × 15
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <grrw_wkb>
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.2 (2022-10-31)
#> os macOS Monterey 12.5
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Halifax
#> date 2023-05-22
#> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> arrow 12.0.0 2023-05-05 [1] CRAN (R 4.2.0)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
#> bit 4.0.5 2022-11-15 [1] CRAN (R 4.2.0)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0)
#> class 7.3-20 2022-01-16 [2] CRAN (R 4.2.2)
#> classInt 0.4-8 2022-09-29 [1] CRAN (R 4.2.0)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.0)
#> DBI 1.1.3.9003 2022-10-31 [1] Github (r-dbi/DBI@a30e771)
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0)
#> dplyr 1.1.0.9000 2023-01-31 [1] Github (tidyverse/dplyr@129c3ad)
#> e1071 1.7-11 2022-06-07 [1] CRAN (R 4.2.0)
#> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
#> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
#> geoarrow * 0.0.0.9000 2023-05-22 [1] Github (paleolimbot/geoarrow@feebf96)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> highr 0.10 2022-12-22 [1] CRAN (R 4.2.0)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.0)
#> jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.0)
#> KernSmooth 2.23-20 2021-05-03 [2] CRAN (R 4.2.2)
#> knitr 1.41 2022-11-18 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> narrow 0.0.0.9000 2023-04-26 [1] Github (paleolimbot/narrow@3572fb5)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.2.0)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0)
#> rlang 1.1.0 2023-03-14 [1] CRAN (R 4.2.0)
#> rmarkdown 2.19 2022-12-15 [1] CRAN (R 4.2.0)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> sf 1.0-12 2023-03-19 [1] CRAN (R 4.2.0)
#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0)
#> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.0)
#> styler 1.8.1 2022-11-07 [1] CRAN (R 4.2.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0)
#> units 0.8-0 2022-02-05 [1] CRAN (R 4.2.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0)
#> vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.2.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> wk 0.7.3 2023-05-06 [1] CRAN (R 4.2.0)
#> xfun 0.36 2022-12-21 [1] CRAN (R 4.2.0)
#> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.0)
#>
#> [1] /Users/deweydunnington/Library/R/arm64/4.2/library
#> [2] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Created on 2023-05-22 with reprex v2.0.2
Just had the same issue as above, installed geoarrow
first, then arrow
. restarting R fixed this for me as well.
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
time zone: America/Toronto
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] sf_1.0-14 geoarrow_0.0.0.9000 dplyr_1.1.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.2 cli_3.6.1 knitr_1.43 rlang_1.1.1 xfun_0.39 DBI_1.1.3 KernSmooth_2.23-22
[8] purrr_1.0.1 renv_0.17.3 generics_0.1.3 assertthat_0.2.1 jsonlite_1.8.4 bit_4.0.5 glue_1.6.2
[15] e1071_1.7-13 fansi_1.0.4 grid_4.3.1 classInt_0.4-9 tibble_3.2.1 yaml_2.3.7 lifecycle_1.0.3
[22] compiler_4.3.1 Rcpp_1.0.11 pkgconfig_2.0.3 narrow_0.0.0.9000 rstudioapi_0.14 wk_0.7.3 R6_2.5.1
[29] class_7.3-22 tidyselect_1.2.0 utf8_1.2.3 pillar_1.9.0 magrittr_2.0.3 bit64_4.0.5 tools_4.3.1
[36] proxy_0.4-27 arrow_12.0.1.1 units_0.8-3
Hi there,
When i try and load a geoparquet (including the example) i get the following error. I think this is to do with the Z dimension as when i tried it with a geoparquet with a z dimension it loaded fine.
nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf")) write_geoparquet(nc, "nc.parquet") read_geoparquet_sf("nc.parquet")
error
Error in geoarrow_schema_wkb(name = schema$name, format = schema$format, : startsWith(format, "w:") || isTRUE(format %in% c("z", "Z")) is not TRUE