Open kylebutts opened 11 months ago
First, just a note that a rewrite is in progress and should be available in January! The new package currently lives here: https://github.com/geoarrow/geoarrow-c/tree/main/r/geoarrow but may get moved to a less confusing location (like geoarrow/geoarrow-r). I'm just in the process of working with the extension type registration ( https://github.com/geoarrow/geoarrow-c/pull/85 ) so this is well-timed!
The issue of automatic loading is a tricky one...the arrow package maybe shouldn't load arbitrary packages (as in, if we somehow encoded "r_pkgs" in the metadata or something), and while it could special-case the geoarrow package when this it is on CRAN, special-casing things can become unwieldy.
In any case, the first step is geoarrow on CRAN 🙂 ...I'm working on it!
The new package currently lives here: https://github.com/geoarrow/geoarrow-c/tree/main/r/geoarrow but may get moved to a less confusing location
Hi Dewey + Team! Trying to play with geoarrow for a project, but am not finding the new package you referenced. I clicked on the link above, but it just shows a "404 page not found" error. Hoping to use it in combination with open_dataset()
on a shiny app I'm spinning up! For context, I've installed this current version of {geoarrow-r}
, but am assuming this is not the one that you want people to be using.
@mrworthington This pull request suggests they were moved back to this repo 2 weeks ago: https://github.com/geoarrow/geoarrow-c/pull/89
This is indeed the version that I'd like people to be using; however, it is missing the read_geoparquet_sf()
function ( https://github.com/geoarrow/geoarrow-r/pull/30 ). If you need the previous version, I tagged it as 0.1.0.
Development did start out in geoarrow-c, but ultimately I found that it made more sense to keep it on its own (hence, geoarrow-r!).
This is indeed the version that I'd like people to be using; however, it is missing the
read_geoparquet_sf()
function ( #30 ). If you need the previous version, I tagged it as 0.1.0.
Going forward, am I correct that we won't need to read_geoparquet_sf()
but rather just use read_parquet()
? And if so, will it automatically become an sf
object? Currently with version 0.1.0.900, I have to run read_parquet('file.parquet') |> geoarrow:::st_as_sf.Dataset()
because if I don't use geoarrow:::st_as_sf.Dataset()
I get the following error:
Error in st_geometry.sf(x) :
attr(obj, "sf_column") does not point to a geometry column.
Did you rename it, without setting st_geometry(obj) <- "newname"?
If you are only reading/writing Parquet files in R (with geoarrow loaded) and/or Python (after import geoarrow.pyarrow
), you can just use write_parquet()
and read_parquest()
. This is not GeoParquet...it's just regular Parquet with extension types. This means that something like GDAL won't be able to understand it (yet) and uploading it to a cloud data warehouse won't work. The upside of not using GeoParquet is that more arrow tools work out-of-the-box (e.g., multi-file datasets via write_dataset()
/open_dataset()
in R or Python).
If you need to read with GDAL or some other tool, I would recommend using geoarrow::read_geoparquet_sf()
(or geoarrow::read_geoparquet()
) and geoarrow::write_geoparquet()
going forward; however, I still have to finish the implementation (#30).
if I don't use geoarrow:::st_as_sf.Dataset() I get the following error:
I think you might want read_parquet(f, as_data_frame = FALSE)
+ st_as_sf()
. I think the problem is that sf doesn't know that a lazy geoarrow column is "geometry". Eventually it probably will but the details of that are complicated and for now you'll have to help it.
Thanks for the info!
If you need to read with GDAL or some other tool, I would recommend using geoarrow::read_geoparquet_sf() (or geoarrow::read_geoparquet()) and geoarrow::write_geoparquet() going forward; however, I still have to finish the implementation (https://github.com/geoarrow/geoarrow-r/pull/30).
So it sounds like geoarrow::write_geoparquet()
and friends are coming back? So I can install with renv::install('geoarrow/geoarrow-r@v0.1.0')
which gets me 0.1.0 instead of renv::install('geoarrow/geoarrow-r')
which gets me 0.1.0.9000?
I'm using this to write parquet files to map with geoarrow/deck.gl layers (as opposed to pmtiles). The README says
Pass -lco GEOMETRY_ENCODING=GEOARROW when converting to Arrow or Parquet files in order to store geometries in a GeoArrow-native geometry column.
Likewise, this post says
Notice the GEOMETRY_ENCODING=GEOARROW? This file isn't quite valid GeoParquet, at least as of version 1.0, because it stores geometries in the efficient Arrow-native encoding instead of as WKB geometries.
This is needed for now because parquet-wasm doesn't have a way to parse WKB geometries into Arrow-native geometries. (A @geoarrow/geoparquet-wasm library is likely to be published by the end of 2023 that will parse any GeoParquet file and load it to GeoArrow.)
So I'm guessing I need to use geoarrow::write_geoparquet()
? Which I get using 0.1.0, correct?
I think you might want read_parquet(f, as_data_frame = FALSE) + st_as_sf(). I think the problem is that sf doesn't know that a lazy geoarrow column is "geometry". Eventually it probably will but the details of that are complicated and for now you'll have to help it.
Yep, that fixed it!
So it sounds like geoarrow::write_geoparquet() and friends are coming back?
Yes! With proper conformance to the 1.0.0 spec. The 1.0.0 spec doesn't include GeoArrow as an encoding option - it's WKB only - and there's some debate over whether it should be there in the first place.
So I'm guessing I need to use geoarrow::write_geoparquet()? Which I get using 0.1.0, correct?
I actually have no idea. I think maybe write_parquet()
will work, but you might have to explicitly tell it to use interleaved coordinates. Off the top of my head I forget exactly how to do that but I'll try to circle back with an example.
First of all, thanks for this awesome work. It's been great to see the progress on all this :-)
In the example on the readme, you load a
.parquet
file that contains a geometry example. Since there is not a separate naming format/convention (e.g..geo.parquet
or.geoparquet
), I might not know that there is a geometry in there, so I just load arrow and open the dataset as normal. Looking at thegeometry
column would be confusing to me. This behavior differs whether I have the geoarrow package loaded or not.This issue might should be in the R
arrow
package, but I'm wondering ifarrow
should detect when there is a geometry column present and adjust behavior (the metadata is in there, so this information is known). For example, when callingcollect()
, should there be a warning that a geometry column is being collected and thatgeoarrow::st_collect()
might be the better option (as in https://github.com/paleolimbot/geoarrow/issues/21)? Or a warning when opening a geoparquet withoutgeoarrow
loaded?