Closed JosiahParry closed 9 months ago
Is there any chance that adding a requireNamespace("geoarrow")
solves it? I'm wondering if the extension registration just didn't kick in.
Will report back in the morning. I didn't try that!
Loading geoarrow and then using as.data.frame()
results in a session crash. I wish those were easier to debug!
If you get serde_esri
to a point where I can build it I'm happy to debug! Right now I get
clang -arch arm64 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o serdesri.so entrypoint.o -L./rust/target/release -lserdesri -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
clang: error: no such file or directory: 'entrypoint.o'
...after a local checkout of arrow-extendr.
I could also share how I debug this kind of thing...I basically have the "CodeLLDB" extension in VSCode and use "Attach to process" from the command palette using Sys.getpid()
from an R terminal (not RStudio!). I also have the following in my .Rprofile:
lldb <- function(pkg = ".") {
url <- sprintf(
"vscode://vadimcn.vscode-lldb/launch/config?{'request':'attach','pid':%d}",
Sys.getpid()
)
system(sprintf("code --open-url %s", shQuote(url)))
if (!is.null(pkg)) {
devtools::load_all(pkg)
}
}
...which basically means that I can type lldb()
in any R terminal (again, not RStudio!) and then paste a reprex that might crash. I haven't tested that on anything except MacOS or Windows but I'm pretty sure CodeLLDB works on Linux, too.
Something else that may help is doing arrow::as_arrow_table(<nanoarrow_array_stream>)$ValidateFull()
. That will tell you if the arrays that you are expecting nanoarrow/geoarrow to convert are valid. (I expect that they are, and that this is a bug with the C/C++ in geoarrow-r).
@paleolimbot is this method only available in the development version of arrow? Running this on 14.0.2
results in
arrow::as_arrow_table(res)
#> Error in `arrow::as_arrow_table()`:
#> ! No method for `as_arrow_table()` for object of class nanoarrow_array_stream
The package should be installable if cloned. https://github.com/JosiahParry/serde_esri/tree/main/r
remotes::install_github()
isn't working due to relative paths outside of the R package which I'll have to figure out at a later point.
Edit: should be installable via remotes::install_github("josiahparry/serde_esri", subdir = "r")
now
Here we go! Assuming I've done everything correctly, this is valid arrow!
library(httr2)
library(serdesri)
furl <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0"
url <- paste0(furl, "/query?where=1=1&outFields=*&f=json&resultRecordCount=100")
req <- httr2::request(url)
resp <- httr2::req_perform(req)
json <- httr2::resp_body_string(resp)
res <- parse_esri_json_raw_geoarrow(resp$body, 2)
rdr <- arrow::as_record_batch_reader(res)
arrow::as_arrow_table(rdr)$Validate()
#> [1] TRUE
Created on 2024-02-04 with reprex v2.0.2
Nice!
I can reproduce the crash, although I also sometimes get:
Error in geoarrow_schema_parse(schema) :
GeoArrowMetadataViewInit() failed: Expected valid GeoArrow JSON metadata but got '{"crs":null,"edges":null}'
Technically that is invalid metadata, although geoarrow-c should probably handle "crs": null
by just pretending that it was omitted completely. I'm guessing geoarrow-rs is what gave this to you.
My guess is that there's something awry in nanoarrow's delegation of extension arrays to other packages (not trivial!) or geoarrow-r, and perhaps something about an error occurring during that process is causing the crash. I'll do some more debugging to see if I can get to the bottom of it!
@paleolimbot is this method only available in the development version of arrow?
It was added int he brand-new version of nanoarrow along with a number of other array/array_stream converter generics! (install.packages("nanoarrow")
).
Much appreciated! I'll take a look and see if I can set the CRS at minimum. I'm unsure how I'd be able to guess if the edges are spherical or not without processing the spatial reference and making that determination that way 🤔
Technically that is invalid metadata, although geoarrow-c should probably handle
"crs": null
by just pretending that it was omitted completely. I'm guessing geoarrow-rs is what gave this to you.
FWIW, I think this can be resolved in the geoarrow-rs crate by adjusting the serialization method for ArrayMetadata struct.
yeah it's invalid and I just haven't gotten around to fixing it
Wowza!!! Looks great!!!
Using some arrow-rs, geoarrow-rust, and extendr magic, I am able to return a RecordBatch with a geoarrow array in it to R as a
nanoarrow_array_stream
, however, using geoarrow-r I've not been able to get this as a geoarrow array. I can useas.data.frame()
to get it into a data.frame but without any nice geometry column