Closed tim-salabim closed 1 month ago
IPC can only handle tables, so this will fail for line strings too! I think you will have to wrap with a one column struct/data.frame.
I don't think I understand, sorry. Can you provide an example of how to write a sf object to arrow with interleaved coordinates?
Sorry I lost track of this! There are some rough edges on geoarrow's part here, but what I was getting at is that write_nanoarrow()
(or Arrow's write_parquet()
) only handle data frame-like things. In other words, it's not the structness of the coordinates that is the problem here, it's that geometry
needs to be its own column in a struct (like an sf object).
library(geoarrow)
library(nanoarrow)
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
n = 100
dat = data.frame(
id = 1:n
, lon = rnorm(n, 19, 3)
, lat = rnorm(n, 50, 3)
)
pts = st_as_sf(dat, coords = c("lon", "lat"), crs = 4326)
# The ability to pass through geoarrow_coord_type = ... through
# infer_nanoarrow_schema.sfc/sf and/or as_nanoarrow_array_stream()
# would eliminate the need for this part
geom_col_name <- attr(pts, "sf_column")
geom_type <- infer_geoarrow_schema(pts, coord_type = "INTERLEAVED")
pts_schema <- infer_nanoarrow_schema(pts)
pts_schema$children[[geom_col_name]] <- geom_type
pts_schema
#> <nanoarrow_schema struct>
#> $ format : chr "+s"
#> $ name : chr ""
#> $ metadata : list()
#> $ flags : int 0
#> $ children :List of 2
#> ..$ id :<nanoarrow_schema int32>
#> .. ..$ format : chr "i"
#> .. ..$ name : chr "id"
#> .. ..$ metadata : list()
#> .. ..$ flags : int 2
#> .. ..$ children : list()
#> .. ..$ dictionary: NULL
#> ..$ geometry:<nanoarrow_schema geoarrow.point{fixed_size_list(2)}>
#> .. ..$ format : chr "+w:2"
#> .. ..$ name : chr "geometry"
#> .. ..$ metadata :List of 2
#> .. .. ..$ ARROW:extension:name : chr "geoarrow.point"
#> .. .. ..$ ARROW:extension:metadata: chr "{\"crs\":{\n \"$schema\": \"https://proj.org/schemas/v0.4/projjson.schema.json\",\n \"type\": \"GeographicCRS"| __truncated__
#> .. ..$ flags : int 2
#> .. ..$ children :List of 1
#> .. .. ..$ xy:<nanoarrow_schema double>
#> .. .. .. ..$ format : chr "g"
#> .. .. .. ..$ name : chr "xy"
#> .. .. .. ..$ metadata : list()
#> .. .. .. ..$ flags : int 0
#> .. .. .. ..$ children : list()
#> .. .. .. ..$ dictionary: NULL
#> .. ..$ dictionary: NULL
#> $ dictionary: NULL
out <- tempfile()
pts |>
as_nanoarrow_array_stream(schema = pts_schema) |>
write_nanoarrow(out)
# Implementing st_as_sf.nanoarrow_array_stream in geoarrow would make this
# much simpler!
tbl <- read_nanoarrow(out) |>
tibble::as_tibble()
tbl[[geom_col_name]] <- st_as_sfc(tbl[[geom_col_name]])
sf::st_as_sf(tbl)
#> Simple feature collection with 100 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 12.02193 ymin: 43.40038 xmax: 24.99115 ymax: 55.39772
#> Geodetic CRS: WGS 84
#> # A tibble: 100 × 2
#> id geometry
#> <int> <POINT [°]>
#> 1 1 (23.18507 49.99223)
#> 2 2 (16.00192 45.08137)
#> 3 3 (20.83954 52.3167)
#> 4 4 (17.30396 46.12193)
#> 5 5 (14.45836 53.15084)
#> 6 6 (20.09797 50.69008)
#> 7 7 (15.51338 51.11113)
#> 8 8 (17.65451 46.20986)
#> 9 9 (20.65463 51.50423)
#> 10 10 (17.04951 51.80477)
#> # ℹ 90 more rows
I am not sure I ever completed the thought in wk, but the chunking logic by coordinate might be useful to you if you're streaming large things:
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
n = 100
dat = data.frame(
id = 1:n
, lon = rnorm(n, 19, 3)
, lat = rnorm(n, 50, 3)
)
pts = st_as_sf(dat, coords = c("lon", "lat"), crs = 4326)
chunker <- wk::wk_chunk_strategy_coordinates(chunk_size = 15)
chunker(list(pts), nrow(pts))
#> from to
#> 1 1 15
#> 2 16 30
#> 3 31 45
#> 4 46 60
#> 5 61 75
#> 6 76 90
#> 7 91 100
Created on 2024-09-25 with reprex v2.1.1
Thanks @paleolimbot ! The workaround - which I can live with - works like a charm! We can now render massive amounts of data in a few seconds! I'll post an example on mastodon once I have rounded all the edges. This is awesome!
Is it possible to write geoarrow data with interleaved coordinates? The following reprex fails for me (whereas it works with separate coordinates)
Created on 2024-09-23 with reprex v2.1.0
Note, this is triggered by https://github.com/geoarrow/deck.gl-layers/issues/126 as deck.gl-layers currently only accepts interleaved coordinates.