Closed ashjolly closed 9 months ago
Thanks for this Issue @ashjolly.
Currently {bcdata} searches and returns all the metadata records available in the B.C. Data Catalogue. It can only pull data from a catalogue record where the data resource is stored in the B.C. Data Catalogue itself (bcdc_get_data()
) or the data resource is stored in the B.C. Geographic Warehouse (bcdc_query_geodata()
), with a few exceptions around file types (e.g. {bcdata} does not work when a data resource is a set of multiple files in a zipped folder).
As you point out above, some of the catalogue metadata records are for Web Apps/User Tools, such as the Snow Map---where the record provides a licence and metadata for the App itself. Rather than scraping data from a Web App, I think a more direct path would be to add the data layer itself (in a non-proprietary or open or common format đ) as a resource within the record.
Thanks Stephanie! Excellent points and clarification - I really appreciate it (and apologies for the delay - I was on leave). I know that GeoBC makes the Web App map from spatial data, so I could see a pathway forward where this initial layer is added to the catalogue, along with the Web App meta data you describe. I'll check in with this regarding the RFC-related resources. Thanks again!
It might be worth revisiting this idea of bcdata
having functionality for ArcGIS REST formats...
For example:
# remotes::install_github("yonghah/esri2sf")
snow_basins_url <- "https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0"
esri2sf::esri2sf(snow_basins_url) |> mapview::mapview(zcol = "Snow_Basin_Index")
There are 59 publicly published datasets in the BC Data Catalogue that are stored as arcgis_rest
.
My take is that while this is in scope for bcdata, the real limitation is that esri2sf is not on CRAN and therefore would at best require a dodgy workaround CRAN policies and at worst impose a fairly sizable maintenance cost.
Yeah, I came here to say the same thing. Would love to do it but without esri2sf on CRAN it's not really doable... One possible way would be to return the url when it's an esri endpoint, and add some documentation about how to use esri2sf?
I mean @bevingtona could also write a custom parser too. I assume it is "just" some json that the esri2sf is handling. đ
+1 for breadcrumbs leading users to esri2sf
. It might be worthwhile to see if the authors of esri2sf
have a path to CRAN?
I'll reiterate my philosophical objection to supporting spatial formats in the BC Data Catalogue—the open data portal—that are not in an open format. I still think this is akin to the getting a horse off a balcony situation, where the horse should not be there in the first place đ´.
For sure, great comments @stephhazlitt @ateucher @boshek .. I think the reality is that so many are hosting data in this format.
I'll look into an in-house parser ..
This one is on CRAN ... arcpullr
# install.packages("arcpullr")
library(arcpullr)
snow_basins_url <-"https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0"
arcpullr::get_spatial_layer(snow_basins_url) |> mapview::mapview(zcol = "Snow_Basin_Index")
That does potentially change things, let's reopen this
Agreed @ateucher & @bevingtona. If there is a CRAN package we can import to parse these spatial files and there is bandwidth to author a PR, I am +1 for adding this enhancement.
@bevingtona if you have time and inclination to do a PR, that would probably expedite this. I can probably get to it some time, but I can't say when.
I haven't been in that bit of the package in a while, but it might be mostly a matter of editing this function/table: https://github.com/bcgov/bcdata/blob/7204f137298d24171c71a8f4147a37c3c4ef5f6e/R/get_data.R#L261
and then getting the dependencies in order... and adding tests of course :)
Even the testing should be pretty straight forward as really all we want to do is make sure it actually works like here:
Testing also probably should involve using a SQL query passed into arcpullr::get_spatial_layer
via ...
(ie. where = "WATERBODY_ROW_NAME = 'Wisconsin River'"
)
FWIW, the arcpullr package looks pretty full featured. For convenience it is definitely helpful to have a bcdc_get_data
"method" to access data like this. But if you were working a ton with a data source like this, you may be better off just get the relevant url with bcdata and then using the arcpullr package directly.
I think jsonlite::read_json()
was the last new data format reader we added, here is the PR that provides a reasonable recipe to follow (edit function table, add test, update NEWS, add import etc.).
FWIW, the arcpullr package looks pretty full featured. For convenience it is definitely helpful to have a
bcdc_get_data
"method" to access data like this. But if you were working a ton with a data source like this, you may be better off just get the relevant url with bcdata and then using the arcpullr package directly.
So... maybe just a message when it's a REST format to use arcpullr
with a syntax example? Or is the use case strong enough to build a few functions?
I think we can justify adding the ability to get arcgis data with bcdc_get_data()
. But I think adding full query functionality would take quite a bit more work (ie enabling bcdc_query_geodata()
to query an arcgis endpoint in addition to wfs). So I propose just the simple method, where the ...
could take the SQL
argument and pass it to arcpullr.
So the catalogue only returns urls to arcgic ui. For example the record for the snow basin index:
R> rec <- bcdc_tidy_resources('712d39f3-de6f-4ddf-a5e5-2066be5e4482')
R> rec$url
[1] "https://governmentofbc.maps.arcgis.com/home/item.html?id=f842bd03020241ed9512746a83137a1f"
[2] "https://governmentofbc.maps.arcgis.com/home/item.html?id=637a958538e44b928fda568784cbb8eb"
For this to work we'd need a field in the record to have the associated API url: https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/Snow_Basins_Indices_View/FeatureServer
There may be some way to construct that link above but that seems brittle. Instead if one can get the API url included in the record, the rest is pretty easy.
It doesn't address the issue @boshek identified, but this looks like an alternative package: https://r.esri.com/arcgislayers/index.html. It looks like it's actually developed by ESRI so may be the most reliable for long-term maintenance... maybe?
Edit: It's not on CRAN yet but I think that is the intention - it's very new
I'm going to close this again. Lots of good info in this about what needs to happen for bcdata to access the arc rest api directly but ultimately none of them that bcdata can fix at the moment.
@bevingtona @ashjolly FWIW, if either of you are an editor of a record with an arcgis_rest
resource and want to add in the associated API url to the record then we would at least have an example to work with to get the plumbing working/tested in bcdata. With a proof-of-concept in-hand, maybe other arcgis_rest
catalogue record editors would follow suit.
Maybe @jongoetz can make this happen?
Amazing! They have just the solution we need. Thanks Steph!
library(arcgis) arc_open("https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0") |> arc_select() |> mapview::mapview(zcol="Snow_Basin_Index")
Almost! We still need that catalogue records to publish the REST API endpoint rather than (or in addition to) the AGOL gui (https://github.com/bcgov/bcdata/issues/257#issuecomment-1918299632)
If folks are really interested in this, the implementation on the bcdata side is pretty easy so just even just getting one record that has the arcgis endpoint would enable this as a proof of concept
Agreed. There is a solution here for bcdata and arc_gis_rest files, we just need a data provider to include the endpoint in the BC Data Catalogue metadata record.
cough @jongoetz or @bevingtona cough ;)
Looks like there is a .json file that maintains the REST links: https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services?f=pjson but there are hundreds.. not sure what they all are.
So we'd need to connect each one to their BC Data Catalog counterpart? ugh
@bevingtona I think data providers who have made the effort to document and make their data findable through the BC Data Catalogue could be convinced to add one more field to records (existing and new) to make the data quickly usable through R. I suggest a start with one record where you or @jongoetz (or someone on your teams) are an editor and we can get a proof of concept in place.
Hi team,
In using the bcdata package for downloading spatial data from the Data Catalogue, I've run into multiple instances where data is present as an 'arcgis_rest' file format. My impression is that this happens when the Data Catalogue is scraping data from Map Hub resources.
For example, the Snow Basin Indices item within the Data Catalogue represents snow data present for polygons within the RFC's Snow Map (I believe...) :
RFC Snow Map Link https://governmentofbc.maps.arcgis.com/apps/webappviewer/index.html?id=b57800e08e46468bab506f9b9f0cbad6
The resultant Data Catalogue entry is in an "arcgis_rest" format: https://catalogue.data.gov.bc.ca/dataset/snow-basin-indices
I notice that the 'bcdata_available' column is FALSE for all of the resources within this link.
We are going to update the polygons within the Snow Map, which will hopefully be represented within the BC Data Catalogue entry. I'm hoping to be able to point my R script to this entry in the BC Data Catalogue rather than rely on a local copy of the spatial data. Additionally, this is also the same situation for the Drought Polygons.
Thanks for your two cents about any suggestions on how to deal with this situation, and thanks for developing such a useful package!
-Ashlee