R-ArcGIS / arcgislayers

ArcGIS Location Services
http://r.esri.com/arcgislayers/
Apache License 2.0
37 stars 8 forks source link

ArcGIS Hub Feed #117

Open JosiahParry opened 7 months ago

JosiahParry commented 7 months ago

Is your feature request related to a problem? Please describe.

Inspired by esri2sf::esrihub()

https://github.com/elipousson/esri2sf/blob/master/R/esrihub.R

Describe the solution you'd like Function to return ArcGIS Hub feeds

What's needed?

This function will return feeds of all types except OGC. We need an example to test against. We do not take the liberty to process XML. Leave that to the user who decides to use an RSS feed (only masochists dabble in xml).

# https://doc.arcgis.com/en/hub/content/federate-data-with-external-catalogs.htm
# url <- "https://data.baltimorecity.gov"

arc_hub_feed <- function(url, feed_type = c("dcat-us", "dcat-ap", "rss", "ogc")) {
  feed_type <- match.arg(feed_type)

  switch(
    feed_type,
    "dcat-us" = hub_dcat_us(url),
    "dcat-ap" = hub_dcat_ap(url),
    "rss" = hub_rss(url),
    # i can't find an example that uses OGC so cannot test
    "ogc" = todo()
  )

}

hub_dcat_us <- function(url) {
  req <- httr2::req_url_path_append(
    httr2::request(url),
    "api/feed/dcat-us/1.1.json"
  )
  resp <- httr2::req_perform(req)
  resp_str <- httr2::resp_body_string(resp)
  RcppSimdJson::fparse(resp_str)
}

hub_dcat_ap <- function(url) {
  req <- httr2::req_url_path_append(
    httr2::request(url),
    "api/feed/dcat-ap/2.0.1.json"
  )
  resp <- httr2::req_perform(req)
  resp_str <- httr2::resp_body_string(resp)
  RcppSimdJson::fparse(resp_str)
}

hub_rss <- function(url) {
  req <- httr2::req_url_path_append(
    httr2::request(url),
    "api/feed/rss/2.0"
  )
  resp <- httr2::req_perform(req)
  httr2::resp_body_xml(resp)
}

todo <- function() {
  cli::cli_abort(
    "TODO! not implemented",
    call = rlang::caller_env()
  )
}

CC @elipousson

JosiahParry commented 7 months ago

Relatedly, there can be a supplemental arc_hub_datasets() which would permit querying based on the following v3/ api endpoints and parameters that are documented, apparently, only at the /api/v3 path.

{
    "message": "Welcome to the V3 API. It is currently under construction",
    "resources": {
        "datasets": {
            "routes": {
                "collection": "https://data.baltimorecity.gov/api/v3/datasets",
                "object": "https://data.baltimorecity.gov/api/v3/datasets{/:id}"
            },
            "parameters": {
                "query": "search term",
                "filter": "filter applied to search. Example: 'filter[tags]=airports'",
                "page[size]": "Number of resources per page. Example: 'page[size]=25' ",
                "page[number]": "The page number for the resources. Example: 'page[number]=2'"
            }
        },
        "jobs": {
            "routes": {
                "collection": "https://data.baltimorecity.gov/api/v3/jobs",
                "object": "https://data.baltimorecity.gov/api/v3/jobs{/:id}"
            }
        }
    }
}

Pagination would be done by making the first request. Then check the first and last urls

url <- "https://data.baltimorecity.gov"
req <- httr2::req_url_path_append(
  httr2::request(url),
  "api/v3/datasets"
)

resp <- httr2::req_perform(req)
res <- RcppSimdJson::fparse(httr2::resp_body_string(resp))
res$links
#> $first
#> [1] "https://data.baltimorecity.gov/api/v3/datasets?page%5Bnumber%5D=1&page%5Bsize%5D=10"
#> 
#> $`next`
#> [1] "https://data.baltimorecity.gov/api/v3/datasets?page%5Bnumber%5D=2&page%5Bsize%5D=10"
#> 
#> $last
#> [1] "https://data.baltimorecity.gov/api/v3/datasets?page%5Bnumber%5D=1313317&page%5Bsize%5D=10"

Then craft the urls in between the next and last and pass those to req_perform_parallel()