R-ArcGIS / arcgislayers

ArcGIS Location Services
http://r.esri.com/arcgislayers/
Apache License 2.0
39 stars 9 forks source link

access/list organisation's AGOL content? #161

Open jmackenzieGA opened 7 months ago

jmackenzieGA commented 7 months ago

I am trying to access / list AGOL content for my organisation. I can successfully reproduce R-ArcGIS demos to read/plot SpatRasters from an ArcGIS ImageServer, as well as to list/read hosted feature services. I am unable to connect to my organisation's AGOL site, but not clear if my URL is missing relevant subdirectories etc? Sample code and error message pasted below. Any help much appreciated )

library(arcgis)

Attaching core arcgis packages:

→ arcgisutils v0.2.0

→ arcgislayers v0.2.0

token <- auth_code() Enter code: xxx

 ######## demo to create spatRaster from arcgis imageserver works 

img_url <- "https://landsat2.arcgis.com/arcgis/rest/services/Landsat/MS/ImageServer" landsat <- arc_open(img_url) res <- arc_raster(landsat, xmin=-71, ymin=43, xmax=-67, ymax=47.5, bbox_crs=4326, width=500, height=500) terra::plotRGB(res, 4, 3, 2, scale = max(landsat[["maxValues"]]))

 ######## demo to read hosted feature service works

furl <- "https://services3.arcgis.com/ZvidGQkLaDJxRSJ2/arcgis/rest/services/PLACES_LocalData_for_BetterHealth/FeatureServer" fsrv <- arc_open(furl) fsrv

FeatureServer <5 layers, 0 tables>>

CRS: 3785

Capabilities: Query,Extract

0: PlacePoints (esriGeometryPoint)

1: PlaceBoundaries (esriGeometryPolygon)

2: Counties (esriGeometryPolygon)

3: Tracts (esriGeometryPolygon)

4: ZCTAs (esriGeometryPolygon)

get_layer(fsrv, id = 2)

FeatureLayer>

Name: Counties

Geometry Type: esriGeometryPolygon

CRS: 3785

Capabilities: Query,Extract

 ######## trial to list AGOL content for organisation unsuccessful    

ga_agol <- "https://greening.maps.arcgis.com/" l.agol <- arc_open(ga_agol)

Error in .deserialize_json(json = json, query = query, empty_array = empty_array, :

TAPE_ERROR: The JSON document has an improper structure: missing or superfluous commas, braces, missing keys, etc.

sessionInfo()

R version 4.3.2 (2023-10-31 ucrt)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 11 x64 (build 22631)

other attached packages:

arcgislayers_0.2.0 arcgisutils_0.2.0 arcgis_0.1.0

JosiahParry commented 7 months ago

@jmackenzieGA thank you for this! As of right now, there is no support for enumerating all items in an organization. However, I've prototyped a solution that you can try! I believe @elipousson has a solution to this in their esri2sf fork that we'd like to migrate over here. Additionally @mmachir has emphasized the importance of this as well.

library(arcgisutils)

# Approach 1: using a URL of an organization
# might be a way to get this info from your auth token in the future
arc_server_content <- function(url, token = arc_token()) {
  req <- arc_base_req(url, token) |> 
    httr2::req_url_path_append("ArcGIS", "rest", "services") |> 
    httr2::req_url_query(f = "json")

  resp <- httr2::req_perform(req)
  res_list <- RcppSimdJson::fparse(httr2::resp_body_string(resp))

  # structure(res_list$services, class = c("tbl", "data.frame"))
  tibble::as_tibble(res_list$services)
}

arc_server_content("https://services.arcgis.com/v01gqwM5QqNysAAi/")
#> # A tibble: 1,204 × 3
#>    name                                     type          url                   
#>    <chr>                                    <chr>         <chr>                 
#>  1 040720_3DEP                              FeatureServer https://services.arcg…
#>  2 041320_3DEP                              FeatureServer https://services.arcg…
#>  3 0f00_444_gdb                             FeatureServer https://services.arcg…
#>  4 1_Percent_Annual_Exceedance_Probability_ FeatureServer https://services.arcg…
#>  5 10_deg_Isotherm_July_2023                FeatureServer https://services.arcg…
#>  6 10_F_Isotherm_July_2022                  FeatureServer https://services.arcg…
#>  7 10_recent_results_May2021_gdb            FeatureServer https://services.arcg…
#>  8 112TCA_10recent                          FeatureServer https://services.arcg…
#>  9 112TCA_Plume_Boundaries                  FeatureServer https://services.arcg…
#> 10 1868_Hayward_Earthquake                  FeatureServer https://services.arcg…
#> # ℹ 1,194 more rows
elipousson commented 7 months ago

Yep! esri2sf supports the same endpoint but also parses the URL to add some helpful extra metadata. It also handles recursion to list feature servers, layers, tables, and (for Enterprise servers) folders all separately.

Previously, the recurse option caused this function to fail when accessing some URLs at services.arcgis.com but I think I just caught and fixed the bug.

Code is here if you have ideas about how to adapt it, @JosiahParry: https://github.com/elipousson/esri2sf/blob/master/R/esriIndex.R

library(esri2sf)

# pak::pkg_install("elipousson/esri2sf")

index <- esriIndex(
  "https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS/rest/services"
)

index_recurse <- esriIndex(
  "https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS/rest/services",
  recurse = TRUE
)

dplyr::glimpse(index)
#> Rows: 30
#> Columns: 5
#> $ name        <chr> "Afghanistan_Anomalies", "Afghanistan_Faults", "Afghanista…
#> $ type        <chr> "FeatureServer", "FeatureServer", "FeatureServer", "Featur…
#> $ url         <chr> "https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS/rest/…
#> $ urlType     <chr> "service", "service", "service", "service", "service", "se…
#> $ serviceType <chr> "FeatureServer", "FeatureServer", "FeatureServer", "Featur…

dplyr::glimpse(index_recurse)
#> Rows: 95
#> Columns: 12
#> $ name              <chr> "Afghanistan_Anomalies", "Afghanistan_Faults", "Afgh…
#> $ type              <chr> "FeatureServer", "FeatureServer", "FeatureServer", "…
#> $ url               <chr> "https://services.arcgis.com/qZHw6MeShRaysmjj/ArcGIS…
#> $ urlType           <chr> "service", "service", "service", "service", "service…
#> $ serviceName       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ serviceType       <chr> "FeatureServer", "FeatureServer", "FeatureServer", "…
#> $ id                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ parentLayerId     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ defaultVisibility <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ minScale          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ maxScale          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ geometryType      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

Created on 2024-03-06 with reprex v2.1.0

jmackenzieGA commented 7 months ago

@JosiahParry , @elipousson - thanks for both of your quick responses! it's super exciting to see you've built these tools to scrape/read/write hosted arcgis data within R )

elipousson commented 7 months ago

Happy to help! Getting a server index has been an enormously helpful workflow for me. The state of Maryland has a very stable, well-managed GIS server so I’ve built a whole set of helper functions that enable URL look-ups with keyword strings https://elipousson.github.io/mapmaryland/reference/get_imap_data.html

It pairs well with a meta-index of relevant servers in your geographical area or professional domain. I set this up for Maryland building on some existing references and I’ve found it to be a very handy reference https://docs.google.com/spreadsheets/u/0/d/1c829bZdNqvbpoizulBU_XE5jVeNNck2kHkS-smpQ52s/htmlview

JosiahParry commented 6 months ago

@elipousson for recursive functionality, how opposed to a list column are you? I think that would be the cleanest solution so that you can always have the same number of rows

elipousson commented 6 months ago

That seems reasonable! You could reference the options in tidyr (probably unnest_longer) that a user may want to use if they need a flat server-wide index. Still probably worth keeping recurse = FALSE as the default — I get the sense that my index function can be a little hard on Baltimore City's ArcGIS server infrastructure.

jmackenzieGA commented 6 months ago

@JosiahParry, @elipousson - thanks again for the help. using your R functions, i can successfully index/read organisational REST feature services published to our arcgis server. is it possible to also access AGOL hosted feature layers, or only REST services?

elipousson commented 6 months ago

The example I shared for esri2sf::esriIndex() was actually using an AGOL hosted service. The REST services are organized at the account (not user) level so you would find the base URL for a AGOL hosted FeatureLayer and try that.

jmackenzieGA commented 6 months ago

great news! is there a standard URL suffix that should be added to 'https://*.maps.argis.com/' for accessing hosted feature layers?

browsing our AGOL site content, I tried several URL candidates (below), but they all return the same lexical error (further below).

index <- esriIndex("https://greening.maps.arcgis.com/") index <- esriIndex("https://greening.maps.arcgis.com/home/content.html") index <- esriIndex("https://greening.maps.arcgis.com/home/index.html") index <- esriIndex("https://greening.maps.arcgis.com/home/search.html") index <- esriIndex("https://greening.maps.arcgis.com/home/organization.html")

Error: lexical error: invalid char in json text. <!DOCTYPE html> <hea (right here) ------^

elipousson commented 6 months ago

Aha. To clarify, the esriIndex() function requires a REST API url — not an "item" URL. Not sure I got the right one, but this is an example of a REST URL for AGOL: https://services3.arcgis.com/4jSrju9pAdOGScAe/ArcGIS/rest/services

I also just updated esri2sf so esriIndex should error with that type of URL in the future. Sorry for the confusion!

JosiahParry commented 2 months ago

For the time being, we can now get query the root level using arc_open() though there is not a recursive functionality to it.

Though, if anyone in here wan't to take that on that would be awesome!

Example repro

``` r res <- arcgislayers::arc_open("https://services.arcgis.com/v01gqwM5QqNysAAi/ArcGIS/rest/services") str(res) #> List of 3 #> $ currentVersion: num 11.2 #> $ services :'data.frame': 1242 obs. of 3 variables: #> ..$ name: chr [1:1242] "040720_3DEP" "041320_3DEP" "0f00_444_gdb" "1_Percent_Annual_Exceedance_Probability_" ... #> ..$ type: chr [1:1242] "FeatureServer" "FeatureServer" "FeatureServer" "FeatureServer" ... #> ..$ url : chr [1:1242] "https://services.arcgis.com/v01gqwM5QqNysAAi/ArcGIS/rest/services/040720_3DEP/FeatureServer" "https://services.arcgis.com/v01gqwM5QqNysAAi/ArcGIS/rest/services/041320_3DEP/FeatureServer" "https://services.arcgis.com/v01gqwM5QqNysAAi/ArcGIS/rest/services/0f00_444_gdb/FeatureServer" "https://services.arcgis.com/v01gqwM5QqNysAAi/ArcGIS/rest/services/1_Percent_Annual_Exceedance_Probability_/FeatureServer" ... #> $ url : chr "https://services.arcgis.com/v01gqwM5QqNysAAi/ArcGIS/rest/services" ``` Created on 2024-07-05 with [reprex v2.1.0](https://reprex.tidyverse.org)
JosiahParry commented 2 months ago

Spent a few minutes exploring the recursive functionality. That would be quite slow. If this is really desired, rather than lapply() through the results of arc_open() it would be a bit more performant to httr2::req_perform_parallel() through the URLs

elipousson commented 2 months ago

The slowness of the response to a recursive query is really dependent on the size of the server (in terms of number of services, folders, layers, etc.).

I just put together a reprex showing a potential implementation that uses a nested list column to return the recursive index. There are a several relatively big issues with this implementation so it is mainly presented for consideration and discussion. If this seems like a potential direction, I can fix the issues and open a PR.

library(arcgislayers)
library(arcgisutils)
#> 
#> Attaching package: 'arcgisutils'
#> The following object is masked from 'package:base':
#> 
#>     %||%

resp_as_service_index <- function(
    x,
    ...,
    parent_type = "services"
) {

  has_folders <- rlang::has_name(x, "folders") && (length(x[["folders"]]) > 0)
  has_layers <- rlang::has_name(x, "layers") && (nrow(x[["layers"]]) > 0)
  has_services <- rlang::has_name(x, "services") && (length(x[["services"]]) > 0)

  if (!any(c(
    has_folders,
    has_layers,
    has_services
  ))) {
    return(data.frame())
  }

  if (has_folders) {
    # folders are a character vector of names
    n_folders <- length(x[["folders"]])

    x[["folders"]] <- data.frame(
        name = x[["folders"]],
        # TODO: Should folder use the service type column?
        # type = rep_len("Folder", length.out = n_folders),
        type = rep_len(NA_character_, length.out = n_folders),
        # FIXME: This convention does not work for nested folders or services in
        # folders
        url = paste0(x[["url"]], "/", x[["folders"]]),
        url_type = rep_len("folder", length.out = n_folders) 
      )
  }

  if (has_services) {
    # services are a data frame with name and type
    n_services <- nrow(x[["services"]])

  # TODO: Double-check if the services data frame ever includes a URL column
    x[["services"]] <- cbind(
      x[["services"]],
      data.frame(
        url = paste0(x[["url"]], "/", x[["services"]][["name"]], "/", x[["services"]][["type"]]),
        url_type = rep_len("service", length.out = n_services)
      )
    )
  }

  if (has_layers) {
    n_layers <- nrow(x[["layers"]])

    x[["layers"]] <- cbind(
      x[["layers"]],
      url = paste0(x[["url"]], "/", x[["layers"]][["id"]]),
      url_type = rep_len("layer", length.out = n_layers)
    )
  }

  # rbind_results fails here unless layers, services, and folders all have the
  # same number of columns
  service_index <- dplyr::bind_rows(
    list(
      layers = x[["layers"]],
      services = x[["services"]],
      folders = x[["folders"]]
    )
  )

  if (nrow(service_index) )

  service_index[["parent_url"]] <- rep_len(
    x[["url"]],
    length.out = nrow(service_index)
  )

  service_index[["parent_type"]] <- rep_len(
    parent_type,
    length.out = nrow(service_index)
  )

  service_index
}

arc_services <- function(..., token = arc_token(), recurse = FALSE) {

  x <- arc_open(..., token = token)

  service_index <- resp_as_service_index(x)

  if (!recurse) {
    return(service_index)
  }

  service_index[["index"]] <- lapply(
    service_index[["url"]],
    \(x) {
      resp_obj <- rlang::try_fetch(
        arc_open(x, token = token),
        error = \(cnd) {
          NULL
        }
      )

      if (is.null(resp_obj)) {
        return(data.frame())
      }

      resp_as_service_index(resp_obj, recurse = recurse)
    }
  )

  service_index
}

alleganygis_index <- arc_services(
  'https://alleganygis.allconet.org/allcogis/rest/services'
)

dplyr::glimpse(
  alleganygis_index
)
#> Rows: 29
#> Columns: 6
#> $ name        <chr> "AddressPoints", "Allegany_Addressing", "AlleganyGeocoder"…
#> $ type        <chr> "MapServer", "GeocodeServer", "GeocodeServer", "MapServer"…
#> $ url         <chr> "https://alleganygis.allconet.org/allcogis/rest/services/A…
#> $ url_type    <chr> "service", "service", "service", "service", "service", "se…
#> $ parent_url  <chr> "https://alleganygis.allconet.org/allcogis/rest/services",…
#> $ parent_type <chr> "services", "services", "services", "services", "services"…

alleganygis_index_recurse <- arc_services(
  'https://alleganygis.allconet.org/allcogis/rest/services',
  recurse = TRUE
)

dplyr::glimpse(
  alleganygis_index_recurse
)
#> Rows: 29
#> Columns: 7
#> $ name        <chr> "AddressPoints", "Allegany_Addressing", "AlleganyGeocoder"…
#> $ type        <chr> "MapServer", "GeocodeServer", "GeocodeServer", "MapServer"…
#> $ url         <chr> "https://alleganygis.allconet.org/allcogis/rest/services/A…
#> $ url_type    <chr> "service", "service", "service", "service", "service", "se…
#> $ parent_url  <chr> "https://alleganygis.allconet.org/allcogis/rest/services",…
#> $ parent_type <chr> "services", "services", "services", "services", "services"…
#> $ index       <list> [<data.frame[1 x 11]>], [<data.frame[0 x 0]>], [<data.fra…

Created on 2024-07-28 with reprex v2.1.0