R-ArcGIS / arcgislayers

Access ArcGIS Data and Location Services
http://r.esri.com/arcgislayers/
Apache License 2.0
47 stars 10 forks source link

ArcGIS Hub Feed #117

Open JosiahParry opened 11 months ago

JosiahParry commented 11 months ago

Is your feature request related to a problem? Please describe.

Inspired by esri2sf::esrihub()

https://github.com/elipousson/esri2sf/blob/master/R/esrihub.R

Describe the solution you'd like Function to return ArcGIS Hub feeds

What's needed?

This function will return feeds of all types except OGC. We need an example to test against. We do not take the liberty to process XML. Leave that to the user who decides to use an RSS feed (only masochists dabble in xml).

# https://doc.arcgis.com/en/hub/content/federate-data-with-external-catalogs.htm
# url <- "https://data.baltimorecity.gov"

arc_hub_feed <- function(url, feed_type = c("dcat-us", "dcat-ap", "rss", "ogc")) {
  feed_type <- match.arg(feed_type)

  switch(
    feed_type,
    "dcat-us" = hub_dcat_us(url),
    "dcat-ap" = hub_dcat_ap(url),
    "rss" = hub_rss(url),
    # i can't find an example that uses OGC so cannot test
    "ogc" = todo()
  )

}

hub_dcat_us <- function(url) {
  req <- httr2::req_url_path_append(
    httr2::request(url),
    "api/feed/dcat-us/1.1.json"
  )
  resp <- httr2::req_perform(req)
  resp_str <- httr2::resp_body_string(resp)
  RcppSimdJson::fparse(resp_str)
}

hub_dcat_ap <- function(url) {
  req <- httr2::req_url_path_append(
    httr2::request(url),
    "api/feed/dcat-ap/2.0.1.json"
  )
  resp <- httr2::req_perform(req)
  resp_str <- httr2::resp_body_string(resp)
  RcppSimdJson::fparse(resp_str)
}

hub_rss <- function(url) {
  req <- httr2::req_url_path_append(
    httr2::request(url),
    "api/feed/rss/2.0"
  )
  resp <- httr2::req_perform(req)
  httr2::resp_body_xml(resp)
}

todo <- function() {
  cli::cli_abort(
    "TODO! not implemented",
    call = rlang::caller_env()
  )
}

CC @elipousson

JosiahParry commented 11 months ago

Relatedly, there can be a supplemental arc_hub_datasets() which would permit querying based on the following v3/ api endpoints and parameters that are documented, apparently, only at the /api/v3 path.

{
    "message": "Welcome to the V3 API. It is currently under construction",
    "resources": {
        "datasets": {
            "routes": {
                "collection": "https://data.baltimorecity.gov/api/v3/datasets",
                "object": "https://data.baltimorecity.gov/api/v3/datasets{/:id}"
            },
            "parameters": {
                "query": "search term",
                "filter": "filter applied to search. Example: 'filter[tags]=airports'",
                "page[size]": "Number of resources per page. Example: 'page[size]=25' ",
                "page[number]": "The page number for the resources. Example: 'page[number]=2'"
            }
        },
        "jobs": {
            "routes": {
                "collection": "https://data.baltimorecity.gov/api/v3/jobs",
                "object": "https://data.baltimorecity.gov/api/v3/jobs{/:id}"
            }
        }
    }
}

Pagination would be done by making the first request. Then check the first and last urls

url <- "https://data.baltimorecity.gov"
req <- httr2::req_url_path_append(
  httr2::request(url),
  "api/v3/datasets"
)

resp <- httr2::req_perform(req)
res <- RcppSimdJson::fparse(httr2::resp_body_string(resp))
res$links
#> $first
#> [1] "https://data.baltimorecity.gov/api/v3/datasets?page%5Bnumber%5D=1&page%5Bsize%5D=10"
#> 
#> $`next`
#> [1] "https://data.baltimorecity.gov/api/v3/datasets?page%5Bnumber%5D=2&page%5Bsize%5D=10"
#> 
#> $last
#> [1] "https://data.baltimorecity.gov/api/v3/datasets?page%5Bnumber%5D=1313317&page%5Bsize%5D=10"

Then craft the urls in between the next and last and pass those to req_perform_parallel()