Closed kbvernon closed 1 year ago
Thank you! I suppose it could make sense to generalize get_layer()
to get_layers()
where id
can be an integer or a chafacter vector.
I'm busy for the next few days and may not have time to handle this. If you want to attempt a PR the function is at: https://github.com/R-ArcGIS/arcgislayers/blob/bb5cfcd6633f2144948db6865871abaf04e5ec12/R/utils-feature-server.R#L28
I think I can help with that. Your get_all_layers()
function has basically all the code you need for a generalized get_layers()
function, you just need to index layer_ids
. That said, running arc_open()
on the updated URLs can be very, very slow, and I wonder if it's strictly necessary. Maybe it would be better to separate out the retrieval of metadata from the building of requests (more in the spirit of httr2), so if folks are already familiar with the structure of the data, they can just skip getting the metadata and go straight to arc_select()
. So,
req <- arc_open()
# print metadata
get_metadata(req)
# append layer to end of request url
lyr <- get_layer(req, id = "bob")
arc_select(lyr)
Not sure how much overhauling of your code that would require though...
For reference, I did some tests with the USGS National Hydrography Dataset (NHDPlus). Some of the layers fail with timeout errors. And they're all very slow.
nhd <- file.path(
"https://hydro.nationalmap.gov",
"arcgis/rest/services",
"NHDPlus_HR",
"MapServer"
)
nhd <- arcgislayers::arc_open(nhd)
layers <- c(
"NHDPoint",
"NetworkNHDFlowline",
"NonNetworkNHDFlowline",
"NHDLine",
"NHDArea",
"NHDWaterbody"
)
layer_ids <- nhd$layers[nhd$layers$name %in% layers, "id", drop = TRUE]
layers <- lapply(layer_ids, \(x){ get_layer(nhd, id = x) })
Okay, i think there are two things here? The first is that you'd like to materialize the data into memory in one is being able to get_layer()
from a server based on a name rather than id. Is that right?
I expect it to be slow to bring a bunchhh of data into R's memory over http. The goal is to encourage users to treat it as a remote data source and filter data as much as possible before bringing it into memory
Hi, yeah, there are two issues - sorry for mixing them up here. In my defense, though, they are related!
get_layer()
is a bonus.get_layer()
is really slow for reasons I don't understand. Bringing the actual spatial and attribute data over can take some time, yes, but get_layer()
isn't really doing that, is it? Near as I can tell, it just appends the layer to the end of the FeatureServer
URL and runs arc_open()
on it to get the layer metadata (under the hood of arc_open()
, it's arcgisutils::fetch_layer_metadata()
). My thinking is to let arc_open()
be a utility function to build the basics of a request, basically just a wrapper around httr2::request()
. Then have other functions like get_metadata()
and list_layers()
for exploring what that data source has (they would wrap httr2::req_perform()
). get_layer()
would then be a wrapper around httr2::req_url_path_append()
.
Sorry if any of this comes across as demanding (or just wildly ignorant on my part - can never rule that out...). Just some thoughts after experimenting with this package last night. I really like what this is doing, and it promises to save me a lot of wasted time writing {httr2}
code directly to interact with ESRI REST APIs. I'm sure that's true for A LOT of other R users, too, so I want to support it.
Each object, whether a FeatureServer
, MapServer
, ImageServer
, or FeatureLayer
is just a list of the metadata that is found at the url and appending ?f=json
to the query. Using your example it would behttps://hydro.nationalmap.gov/arcgis/rest/services/NHDPlus_HR/MapServer?f=json
. If you look at the structure of the object, it is exactly the json.
str(nhd_srv, 1)
#> List of 26
#> $ currentVersion : num 10.8
#> $ serviceDescription : chr "The USGS NHDPlus High Resolution service, NHDPlus_HR, a part of The National Map, is a "| __truncated__
#> $ mapName : chr "Layers"
#> $ description : chr ""
#> $ copyrightText : chr "USGS TNM – National Hydrography Dataset Plus High Resolution (NHDPlus HR). Data refreshed October, 2022."
#> $ supportsDynamicLayers : logi TRUE
#> $ layers :'data.frame': 13 obs. of 9 variables:
#> $ spatialReference :List of 4
#> $ singleFusedMapCache : logi FALSE
#> $ initialExtent :List of 5
#> $ fullExtent :List of 5
#> $ minScale : int 0
#> $ maxScale : int 0
#> $ units : chr "esriMeters"
#> $ supportedImageFormatTypes : chr "PNG32,PNG24,PNG,JPG,DIB,TIFF,EMF,PS,PDF,GIF,SVG,SVGZ,BMP"
#> $ documentInfo :List of 8
#> $ capabilities : chr "Data,Map,Query"
#> $ supportedQueryFormats : chr "JSON, geoJSON"
#> $ exportTilesAllowed : logi FALSE
#> $ referenceScale : int 0
#> $ supportsDatumTransformation: logi TRUE
#> $ maxRecordCount : int 1000
#> $ maxImageHeight : int 4096
#> $ maxImageWidth : int 4096
#> $ supportedExtensions : chr "WMSServer"
#> $ url : chr "https://hydro.nationalmap.gov/arcgis/rest/services/NHDPlus_HR/MapServer"#>
get_layer()
works by fetching the layers
element (which is a data frame) and ensuring that the id
is present in the table, if so, it uses arc_open()
on the MapServer's (or FeatureServer) url with the id
appended as a path.
arc_open()
on the other-hand fetches the metadata (remember, ?f=json
) or a URL and assigns the appropriate class on the object.
Any performance issues we see here are more than likely an upstream issue. I haven't been able to reliably find a difference between these two. The speeds vary drastically in each.
library(arcgislayers)
nhd <- file.path(
"https://hydro.nationalmap.gov",
"arcgis/rest/services",
"NHDPlus_HR",
"MapServer"
)
nhd_srv <- arc_open(nhd)
get_time <- system.time(get_layer(nhd_srv, 0))
fp_time <- system.time(arc_open(file.path(nhd[["url"]], "0")))
That's helpful. Thanks for the clarification. And apologies for the topic drift.
Assuming you don't want users interacting with the guts of a
FeatureServer
object, would be nice to have a function that takes aFeatureServer
object and a character vector for the feature layers you want and returns the ids of those feature layers. This would make it easier to work withget_layer()
andarc_select()
.Something like:
Alternatively, you could add a
layer
argument toget_layer()
and letid
go missing, soget_layer(conn, layer = "bob")
.