bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
81 stars 12 forks source link

argis_rest file formats? #257

Closed ashjolly closed 7 months ago

ashjolly commented 3 years ago

Hi team,

In using the bcdata package for downloading spatial data from the Data Catalogue, I've run into multiple instances where data is present as an 'arcgis_rest' file format. My impression is that this happens when the Data Catalogue is scraping data from Map Hub resources.

For example, the Snow Basin Indices item within the Data Catalogue represents snow data present for polygons within the RFC's Snow Map (I believe...) :

RFC Snow Map Link https://governmentofbc.maps.arcgis.com/apps/webappviewer/index.html?id=b57800e08e46468bab506f9b9f0cbad6

The resultant Data Catalogue entry is in an "arcgis_rest" format: https://catalogue.data.gov.bc.ca/dataset/snow-basin-indices

library(bcdata)
bcdata::bcdc_get_record("712d39f3-de6f-4ddf-a5e5-2066be5e4482")
bcdata::bcdc_query_geodata("712d39f3-de6f-4ddf-a5e5-2066be5e4482") %>% dplyr::collect()
bcdata::bcdc_get_data("712d39f3-de6f-4ddf-a5e5-2066be5e4482") %>% dplyr::collect()

I notice that the 'bcdata_available' column is FALSE for all of the resources within this link.

We are going to update the polygons within the Snow Map, which will hopefully be represented within the BC Data Catalogue entry. I'm hoping to be able to point my R script to this entry in the BC Data Catalogue rather than rely on a local copy of the spatial data. Additionally, this is also the same situation for the Drought Polygons.

Thanks for your two cents about any suggestions on how to deal with this situation, and thanks for developing such a useful package!

-Ashlee

stephhazlitt commented 3 years ago

Thanks for this Issue @ashjolly.

Currently {bcdata} searches and returns all the metadata records available in the B.C. Data Catalogue. It can only pull data from a catalogue record where the data resource is stored in the B.C. Data Catalogue itself (bcdc_get_data()) or the data resource is stored in the B.C. Geographic Warehouse (bcdc_query_geodata()), with a few exceptions around file types (e.g. {bcdata} does not work when a data resource is a set of multiple files in a zipped folder).

As you point out above, some of the catalogue metadata records are for Web Apps/User Tools, such as the Snow Map---where the record provides a licence and metadata for the App itself. Rather than scraping data from a Web App, I think a more direct path would be to add the data layer itself (in a non-proprietary or open or common format 😉) as a resource within the record.

ashjolly commented 3 years ago

Thanks Stephanie! Excellent points and clarification - I really appreciate it (and apologies for the delay - I was on leave). I know that GeoBC makes the Web App map from spatial data, so I could see a pathway forward where this initial layer is added to the catalogue, along with the Web App meta data you describe. I'll check in with this regarding the RFC-related resources. Thanks again!

bevingtona commented 7 months ago

It might be worth revisiting this idea of bcdata having functionality for ArcGIS REST formats...

For example:

# remotes::install_github("yonghah/esri2sf")
snow_basins_url <- "https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0"
esri2sf::esri2sf(snow_basins_url) |> mapview::mapview(zcol = "Snow_Basin_Index")

image

There are 59 publicly published datasets in the BC Data Catalogue that are stored as arcgis_rest.

image

boshek commented 7 months ago

My take is that while this is in scope for bcdata, the real limitation is that esri2sf is not on CRAN and therefore would at best require a dodgy workaround CRAN policies and at worst impose a fairly sizable maintenance cost.

ateucher commented 7 months ago

Yeah, I came here to say the same thing. Would love to do it but without esri2sf on CRAN it's not really doable... One possible way would be to return the url when it's an esri endpoint, and add some documentation about how to use esri2sf?

boshek commented 7 months ago

I mean @bevingtona could also write a custom parser too. I assume it is "just" some json that the esri2sf is handling. 😜

stephhazlitt commented 7 months ago

+1 for breadcrumbs leading users to esri2sf. It might be worthwhile to see if the authors of esri2sf have a path to CRAN?

I'll reiterate my philosophical objection to supporting spatial formats in the BC Data Catalogue—the open data portal—that are not in an open format. I still think this is akin to the getting a horse off a balcony situation, where the horse should not be there in the first place 🐴.

bevingtona commented 7 months ago

For sure, great comments @stephhazlitt @ateucher @boshek .. I think the reality is that so many are hosting data in this format.

I'll look into an in-house parser ..

bevingtona commented 7 months ago

This one is on CRAN ... arcpullr

# install.packages("arcpullr")
library(arcpullr)
snow_basins_url <-"https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0"
arcpullr::get_spatial_layer(snow_basins_url) |> mapview::mapview(zcol = "Snow_Basin_Index")

image

ateucher commented 7 months ago

That does potentially change things, let's reopen this

stephhazlitt commented 7 months ago

Agreed @ateucher & @bevingtona. If there is a CRAN package we can import to parse these spatial files and there is bandwidth to author a PR, I am +1 for adding this enhancement.

ateucher commented 7 months ago

@bevingtona if you have time and inclination to do a PR, that would probably expedite this. I can probably get to it some time, but I can't say when.

ateucher commented 7 months ago

I haven't been in that bit of the package in a while, but it might be mostly a matter of editing this function/table: https://github.com/bcgov/bcdata/blob/7204f137298d24171c71a8f4147a37c3c4ef5f6e/R/get_data.R#L261

and then getting the dependencies in order... and adding tests of course :)

ateucher commented 7 months ago

Related: https://github.com/bcgov/bcdata/issues/325

boshek commented 7 months ago

Even the testing should be pretty straight forward as really all we want to do is make sure it actually works like here:

https://github.com/bcgov/bcdata/blob/7204f137298d24171c71a8f4147a37c3c4ef5f6e/tests/testthat/test-get-data.R#L61-L66

Testing also probably should involve using a SQL query passed into arcpullr::get_spatial_layer via ... (ie. where = "WATERBODY_ROW_NAME = 'Wisconsin River'")

FWIW, the arcpullr package looks pretty full featured. For convenience it is definitely helpful to have a bcdc_get_data "method" to access data like this. But if you were working a ton with a data source like this, you may be better off just get the relevant url with bcdata and then using the arcpullr package directly.

stephhazlitt commented 7 months ago

I think jsonlite::read_json() was the last new data format reader we added, here is the PR that provides a reasonable recipe to follow (edit function table, add test, update NEWS, add import etc.).

bevingtona commented 7 months ago

FWIW, the arcpullr package looks pretty full featured. For convenience it is definitely helpful to have a bcdc_get_data "method" to access data like this. But if you were working a ton with a data source like this, you may be better off just get the relevant url with bcdata and then using the arcpullr package directly.

So... maybe just a message when it's a REST format to use arcpullr with a syntax example? Or is the use case strong enough to build a few functions?

ateucher commented 7 months ago

I think we can justify adding the ability to get arcgis data with bcdc_get_data(). But I think adding full query functionality would take quite a bit more work (ie enabling bcdc_query_geodata() to query an arcgis endpoint in addition to wfs). So I propose just the simple method, where the ... could take the SQL argument and pass it to arcpullr.

boshek commented 7 months ago

So the catalogue only returns urls to arcgic ui. For example the record for the snow basin index:

R> rec <- bcdc_tidy_resources('712d39f3-de6f-4ddf-a5e5-2066be5e4482')
R> rec$url
[1] "https://governmentofbc.maps.arcgis.com/home/item.html?id=f842bd03020241ed9512746a83137a1f"
[2] "https://governmentofbc.maps.arcgis.com/home/item.html?id=637a958538e44b928fda568784cbb8eb"

For this to work we'd need a field in the record to have the associated API url: https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/Snow_Basins_Indices_View/FeatureServer

There may be some way to construct that link above but that seems brittle. Instead if one can get the API url included in the record, the rest is pretty easy.

ateucher commented 7 months ago

It doesn't address the issue @boshek identified, but this looks like an alternative package: https://r.esri.com/arcgislayers/index.html. It looks like it's actually developed by ESRI so may be the most reliable for long-term maintenance... maybe?

Edit: It's not on CRAN yet but I think that is the intention - it's very new

boshek commented 7 months ago

I'm going to close this again. Lots of good info in this about what needs to happen for bcdata to access the arc rest api directly but ultimately none of them that bcdata can fix at the moment.

stephhazlitt commented 7 months ago

@bevingtona @ashjolly FWIW, if either of you are an editor of a record with an arcgis_rest resource and want to add in the associated API url to the record then we would at least have an example to work with to get the plumbing working/tested in bcdata. With a proof-of-concept in-hand, maybe other arcgis_rest catalogue record editors would follow suit.

bevingtona commented 7 months ago

Maybe @jongoetz can make this happen?

stephhazlitt commented 5 months ago

https://www.esri.com/arcgis-blog/products/developers/announcements/announcing-arcgis-r-package/

jongoetz commented 5 months ago

Amazing! They have just the solution we need. Thanks Steph!

library(arcgis) arc_open("https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0") |> arc_select() |> mapview::mapview(zcol="Snow_Basin_Index")

ateucher commented 5 months ago

Almost! We still need that catalogue records to publish the REST API endpoint rather than (or in addition to) the AGOL gui (https://github.com/bcgov/bcdata/issues/257#issuecomment-1918299632)

boshek commented 5 months ago

If folks are really interested in this, the implementation on the bcdata side is pretty easy so just even just getting one record that has the arcgis endpoint would enable this as a proof of concept

stephhazlitt commented 5 months ago

Agreed. There is a solution here for bcdata and arc_gis_rest files, we just need a data provider to include the endpoint in the BC Data Catalogue metadata record.

boshek commented 5 months ago

cough @jongoetz or @bevingtona cough ;)

bevingtona commented 5 months ago

Looks like there is a .json file that maintains the REST links: https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services?f=pjson but there are hundreds.. not sure what they all are.

So we'd need to connect each one to their BC Data Catalog counterpart? ugh

stephhazlitt commented 5 months ago

@bevingtona I think data providers who have made the effort to document and make their data findable through the BC Data Catalogue could be convinced to add one more field to records (existing and new) to make the data quickly usable through R. I suggest a start with one record where you or @jongoetz (or someone on your teams) are an editor and we can get a proof of concept in place.