Open jmackenzieGA opened 7 months ago
@jmackenzieGA thank you for this! As of right now, there is no support for enumerating all items in an organization. However, I've prototyped a solution that you can try! I believe @elipousson has a solution to this in their esri2sf fork that we'd like to migrate over here. Additionally @mmachir has emphasized the importance of this as well.
library(arcgisutils)
# Approach 1: using a URL of an organization
# might be a way to get this info from your auth token in the future
arc_server_content <- function(url, token = arc_token()) {
req <- arc_base_req(url, token) |>
httr2::req_url_path_append("ArcGIS", "rest", "services") |>
httr2::req_url_query(f = "json")
resp <- httr2::req_perform(req)
res_list <- RcppSimdJson::fparse(httr2::resp_body_string(resp))
# structure(res_list$services, class = c("tbl", "data.frame"))
tibble::as_tibble(res_list$services)
}
arc_server_content("https://services.arcgis.com/v01gqwM5QqNysAAi/")
#> # A tibble: 1,204 × 3
#> name type url
#> <chr> <chr> <chr>
#> 1 040720_3DEP FeatureServer https://services.arcg…
#> 2 041320_3DEP FeatureServer https://services.arcg…
#> 3 0f00_444_gdb FeatureServer https://services.arcg…
#> 4 1_Percent_Annual_Exceedance_Probability_ FeatureServer https://services.arcg…
#> 5 10_deg_Isotherm_July_2023 FeatureServer https://services.arcg…
#> 6 10_F_Isotherm_July_2022 FeatureServer https://services.arcg…
#> 7 10_recent_results_May2021_gdb FeatureServer https://services.arcg…
#> 8 112TCA_10recent FeatureServer https://services.arcg…
#> 9 112TCA_Plume_Boundaries FeatureServer https://services.arcg…
#> 10 1868_Hayward_Earthquake FeatureServer https://services.arcg…
#> # ℹ 1,194 more rows
Yep! esri2sf supports the same endpoint but also parses the URL to add some helpful extra metadata. It also handles recursion to list feature servers, layers, tables, and (for Enterprise servers) folders all separately.
Previously, the recurse option caused this function to fail when accessing some URLs at services.arcgis.com
but I think I just caught and fixed the bug.
Code is here if you have ideas about how to adapt it, @JosiahParry: https://github.com/elipousson/esri2sf/blob/master/R/esriIndex.R
library(esri2sf)
# pak::pkg_install("elipousson/esri2sf")
index <- esriIndex(
"https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS/rest/services"
)
index_recurse <- esriIndex(
"https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS/rest/services",
recurse = TRUE
)
dplyr::glimpse(index)
#> Rows: 30
#> Columns: 5
#> $ name <chr> "Afghanistan_Anomalies", "Afghanistan_Faults", "Afghanista…
#> $ type <chr> "FeatureServer", "FeatureServer", "FeatureServer", "Featur…
#> $ url <chr> "https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS/rest/…
#> $ urlType <chr> "service", "service", "service", "service", "service", "se…
#> $ serviceType <chr> "FeatureServer", "FeatureServer", "FeatureServer", "Featur…
dplyr::glimpse(index_recurse)
#> Rows: 95
#> Columns: 12
#> $ name <chr> "Afghanistan_Anomalies", "Afghanistan_Faults", "Afgh…
#> $ type <chr> "FeatureServer", "FeatureServer", "FeatureServer", "…
#> $ url <chr> "https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS…
#> $ urlType <chr> "service", "service", "service", "service", "service…
#> $ serviceName <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ serviceType <chr> "FeatureServer", "FeatureServer", "FeatureServer", "…
#> $ id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ parentLayerId <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ defaultVisibility <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ minScale <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ maxScale <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ geometryType <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
Created on 2024-03-06 with reprex v2.1.0
@JosiahParry , @elipousson - thanks for both of your quick responses! it's super exciting to see you've built these tools to scrape/read/write hosted arcgis data within R )
Happy to help! Getting a server index has been an enormously helpful workflow for me. The state of Maryland has a very stable, well-managed GIS server so I’ve built a whole set of helper functions that enable URL look-ups with keyword strings https://elipousson.github.io/mapmaryland/reference/get_imap_data.html
It pairs well with a meta-index of relevant servers in your geographical area or professional domain. I set this up for Maryland building on some existing references and I’ve found it to be a very handy reference https://docs.google.com/spreadsheets/u/0/d/1c829bZdNqvbpoizulBU_XE5jVeNNck2kHkS-smpQ52s/htmlview
@elipousson for recursive functionality, how opposed to a list column are you? I think that would be the cleanest solution so that you can always have the same number of rows
That seems reasonable! You could reference the options in tidyr (probably unnest_longer) that a user may want to use if they need a flat server-wide index. Still probably worth keeping recurse = FALSE
as the default — I get the sense that my index function can be a little hard on Baltimore City's ArcGIS server infrastructure.
@JosiahParry, @elipousson - thanks again for the help. using your R functions, i can successfully index/read organisational REST feature services published to our arcgis server. is it possible to also access AGOL hosted feature layers, or only REST services?
The example I shared for esri2sf::esriIndex()
was actually using an AGOL hosted service. The REST services are organized at the account (not user) level so you would find the base URL for a AGOL hosted FeatureLayer and try that.
great news! is there a standard URL suffix that should be added to 'https://*.maps.argis.com/' for accessing hosted feature layers?
browsing our AGOL site content, I tried several URL candidates (below), but they all return the same lexical error (further below).
index <- esriIndex("https://greening.maps.arcgis.com/") index <- esriIndex("https://greening.maps.arcgis.com/home/content.html") index <- esriIndex("https://greening.maps.arcgis.com/home/index.html") index <- esriIndex("https://greening.maps.arcgis.com/home/search.html") index <- esriIndex("https://greening.maps.arcgis.com/home/organization.html")
Error: lexical error: invalid char in json text. <!DOCTYPE html> <hea (right here) ------^
Aha. To clarify, the esriIndex()
function requires a REST API url — not an "item" URL. Not sure I got the right one, but this is an example of a REST URL for AGOL: https://services3.arcgis.com/4jSrju9pAdOGScAe/ArcGIS/rest/services
I also just updated esri2sf so esriIndex should error with that type of URL in the future. Sorry for the confusion!
For the time being, we can now get query the root level using arc_open()
though there is not a recursive functionality to it.
Though, if anyone in here wan't to take that on that would be awesome!
Spent a few minutes exploring the recursive functionality. That would be quite slow. If this is really desired, rather than lapply()
through the results of arc_open()
it would be a bit more performant to httr2::req_perform_parallel()
through the URLs
The slowness of the response to a recursive query is really dependent on the size of the server (in terms of number of services, folders, layers, etc.).
I just put together a reprex showing a potential implementation that uses a nested list column to return the recursive index. There are a several relatively big issues with this implementation so it is mainly presented for consideration and discussion. If this seems like a potential direction, I can fix the issues and open a PR.
library(arcgislayers)
library(arcgisutils)
#>
#> Attaching package: 'arcgisutils'
#> The following object is masked from 'package:base':
#>
#> %||%
resp_as_service_index <- function(
x,
...,
parent_type = "services"
) {
has_folders <- rlang::has_name(x, "folders") && (length(x[["folders"]]) > 0)
has_layers <- rlang::has_name(x, "layers") && (nrow(x[["layers"]]) > 0)
has_services <- rlang::has_name(x, "services") && (length(x[["services"]]) > 0)
if (!any(c(
has_folders,
has_layers,
has_services
))) {
return(data.frame())
}
if (has_folders) {
# folders are a character vector of names
n_folders <- length(x[["folders"]])
x[["folders"]] <- data.frame(
name = x[["folders"]],
# TODO: Should folder use the service type column?
# type = rep_len("Folder", length.out = n_folders),
type = rep_len(NA_character_, length.out = n_folders),
# FIXME: This convention does not work for nested folders or services in
# folders
url = paste0(x[["url"]], "/", x[["folders"]]),
url_type = rep_len("folder", length.out = n_folders)
)
}
if (has_services) {
# services are a data frame with name and type
n_services <- nrow(x[["services"]])
# TODO: Double-check if the services data frame ever includes a URL column
x[["services"]] <- cbind(
x[["services"]],
data.frame(
url = paste0(x[["url"]], "/", x[["services"]][["name"]], "/", x[["services"]][["type"]]),
url_type = rep_len("service", length.out = n_services)
)
)
}
if (has_layers) {
n_layers <- nrow(x[["layers"]])
x[["layers"]] <- cbind(
x[["layers"]],
url = paste0(x[["url"]], "/", x[["layers"]][["id"]]),
url_type = rep_len("layer", length.out = n_layers)
)
}
# rbind_results fails here unless layers, services, and folders all have the
# same number of columns
service_index <- dplyr::bind_rows(
list(
layers = x[["layers"]],
services = x[["services"]],
folders = x[["folders"]]
)
)
if (nrow(service_index) )
service_index[["parent_url"]] <- rep_len(
x[["url"]],
length.out = nrow(service_index)
)
service_index[["parent_type"]] <- rep_len(
parent_type,
length.out = nrow(service_index)
)
service_index
}
arc_services <- function(..., token = arc_token(), recurse = FALSE) {
x <- arc_open(..., token = token)
service_index <- resp_as_service_index(x)
if (!recurse) {
return(service_index)
}
service_index[["index"]] <- lapply(
service_index[["url"]],
\(x) {
resp_obj <- rlang::try_fetch(
arc_open(x, token = token),
error = \(cnd) {
NULL
}
)
if (is.null(resp_obj)) {
return(data.frame())
}
resp_as_service_index(resp_obj, recurse = recurse)
}
)
service_index
}
alleganygis_index <- arc_services(
'https://alleganygis.allconet.org/allcogis/rest/services'
)
dplyr::glimpse(
alleganygis_index
)
#> Rows: 29
#> Columns: 6
#> $ name <chr> "AddressPoints", "Allegany_Addressing", "AlleganyGeocoder"…
#> $ type <chr> "MapServer", "GeocodeServer", "GeocodeServer", "MapServer"…
#> $ url <chr> "https://alleganygis.allconet.org/allcogis/rest/services/A…
#> $ url_type <chr> "service", "service", "service", "service", "service", "se…
#> $ parent_url <chr> "https://alleganygis.allconet.org/allcogis/rest/services",…
#> $ parent_type <chr> "services", "services", "services", "services", "services"…
alleganygis_index_recurse <- arc_services(
'https://alleganygis.allconet.org/allcogis/rest/services',
recurse = TRUE
)
dplyr::glimpse(
alleganygis_index_recurse
)
#> Rows: 29
#> Columns: 7
#> $ name <chr> "AddressPoints", "Allegany_Addressing", "AlleganyGeocoder"…
#> $ type <chr> "MapServer", "GeocodeServer", "GeocodeServer", "MapServer"…
#> $ url <chr> "https://alleganygis.allconet.org/allcogis/rest/services/A…
#> $ url_type <chr> "service", "service", "service", "service", "service", "se…
#> $ parent_url <chr> "https://alleganygis.allconet.org/allcogis/rest/services",…
#> $ parent_type <chr> "services", "services", "services", "services", "services"…
#> $ index <list> [<data.frame[1 x 11]>], [<data.frame[0 x 0]>], [<data.fra…
Created on 2024-07-28 with reprex v2.1.0
I am trying to access / list AGOL content for my organisation. I can successfully reproduce R-ArcGIS demos to read/plot SpatRasters from an ArcGIS ImageServer, as well as to list/read hosted feature services. I am unable to connect to my organisation's AGOL site, but not clear if my URL is missing relevant subdirectories etc? Sample code and error message pasted below. Any help much appreciated )