ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Replace `geoarrow` functions in ingest scripts #492

Open dfsnow opened 3 weeks ago

dfsnow commented 3 weeks ago

The read_geoparquet() and write_geoparquet() functions used in our ETL ingest scripts are now deprecated, as is the CRAN geoarrow library from which they are sourced. We should switch to the new geoarrow backend located here: https://github.com/geoarrow/geoarrow-r

This will involve updating our various scripts and renv to point to the new geo/nanoarrow package, and replacing the dedicated geoparquet functions with their equivalent generics.

wrridgeway commented 2 weeks ago

I tested this out and the geometry that gets written to the parquet file is no longer WKB, which interferes with our SQL spatial joins and distance calculations. We can hack around it using

mutate(across(starts_with('geometry'), ~ hex2raw(st_as_binary(.x, hex = TRUE))))

but it seems like we'd primarily lose rather than gain functionality for our purposes by switching to the new version.

Edit - From the maintainer:

If you need a workaround, you can create the WKB-encoded table yourself from an sf

library(sf)
library(geoarrow)

nc <- read_sf(system.file("gpkg/nc.gpkg", package = "sf"))

df <- tibble::as_tibble(nc)
df$geom <- as_geoarrow_vctr(df$geom, geoarrow_wkb())
tbl <- arrow::as_arrow_table(df)

...and add metadata using tbl$metadata$geo = "{...}"