UNHCR-Guatemala / A2SIT

Admin2 Severity Index Tool
https://unhcr-guatemala.github.io/A2SIT/
Other
2 stars 1 forks source link

As user, I would like to select the country for which my severity index needs to be build so that I can get the right map #11

Closed bluefoxr closed 11 months ago

bluefoxr commented 1 year ago

Hi @Edouard-Legoupil thanks for sharing the API links to the map server in our meetingr recently. I still have a few questions if you have a minute as this is something I have less experience with.

What I want to do is to:

(a) Access admin2 shape files for a specified country (e.g. by ISO3 code) in a format that can be fed into Leaflet. (b) Possibly, access admin2 shape files for the whole world in one go

My problem is that I don't understand the API very well. I believe that the correct maps are here:

https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_adm2_a_unhcr/MapServer

But how to I extend this query to get one or either of the things I am interested in above? How did you generate the queries that you sent me?

Thanks for any clarifications on this, or if you could direct me to the right documentation - I have looked but somehow not found what I was looking for.

Edouard-Legoupil commented 1 year ago

Hello -

I have asked our HQ Gis colleagues to provide more details.

Our webservices are here: https://data.unhcr.org/en/geoservices/

An example of a call as shared before is

proj = '+proj=robin +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs'
url_bg <- "[https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_int_15m_a_unhcr/FeatureServer/0/query?where=iso3+%3C%3E+%27ATA%27&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=*&returnGeometry=true&maxAllowableOffset=&geometryPrecision=2&outSR=4326&havingClause=&gdbVersion=&historicMoment=&returnDistinctValues=false&returnIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&multipatchOption=xyFootprint&resultOffset=&resultRecordCount=&returnTrueCurves=false&returnExceededLimitFeatures=false&quantizationParameters=&returnCentroid=false&sqlFormat=none&resultType=&featureEncoding=esriDefault&datumTransformation=&f=geojson"](https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_int_15m_a_unhcr/FeatureServer/0/query?where=iso3+%3C%3E+%27ATA%27&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=*&returnGeometry=true&maxAllowableOffset=&geometryPrecision=2&outSR=4326&havingClause=&gdbVersion=&historicMoment=&returnDistinctValues=false&returnIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&multipatchOption=xyFootprint&resultOffset=&resultRecordCount=&returnTrueCurves=false&returnExceededLimitFeatures=false&quantizationParameters=&returnCentroid=false&sqlFormat=none&resultType=&featureEncoding=esriDefault&datumTransformation=&f=geojson%22)
sf_bg <- geojsonsf::geojson_sf(url_bg) %>% sf::st_transform(., proj) 
#head(sf_bg)
plot(sf_bg$geometry)

 

url_borders <- "[https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_int_15m_l_unhcr/FeatureServer/0/query?where=1%3D1&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=type_unhcr&returnGeometry=true&maxAllowableOffset=&geometryPrecision=2&outSR=4326&havingClause=&gdbVersion=&historicMoment=&returnDistinctValues=false&returnIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&multipatchOption=xyFootprint&resultOffset=&resultRecordCount=&returnTrueCurves=false&returnExceededLimitFeatures=false&quantizationParameters=&returnCentroid=false&sqlFormat=none&resultType=&featureEncoding=esriDefault&datumTransformation=&f=geojson"](https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_int_15m_l_unhcr/FeatureServer/0/query?where=1%3D1&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=type_unhcr&returnGeometry=true&maxAllowableOffset=&geometryPrecision=2&outSR=4326&havingClause=&gdbVersion=&historicMoment=&returnDistinctValues=false&returnIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&multipatchOption=xyFootprint&resultOffset=&resultRecordCount=&returnTrueCurves=false&returnExceededLimitFeatures=false&quantizationParameters=&returnCentroid=false&sqlFormat=none&resultType=&featureEncoding=esriDefault&datumTransformation=&f=geojson%22)
sf_borders <- geojsonsf::geojson_sf(url_borders) %>% sf::st_transform(., proj) 
head(sf_borders)
plot(sf_borders$geometry)

Would be probably good to first simplify the geometry before bringing them in the app in order to avoid memory issue..

See: https://github.com/martinepunhcr/MVI_Guatemala/blob/main/dev/severity_index.Rmd#L191:L200

pvernier commented 1 year ago

Hi @bluefoxr,

My name is Pierre, I work at UNHCR, in the GIS unit. The best way to get familiar with the ArcGIS REST API is to use the form: https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_adm2_a_unhcr/MapServer/0/query

The important fields are:

Then click on the button Query (GET). You can copy/paste the output in http://geojson.io/ to see if it's a valid GeoJSON. Also, have a look at the URL of the output to see the link between the form and the URL and teh query parameters.

You also check this tutorial: https://www.youtube.com/watch?v=jDHMKbfm4WY

It's not possible to get all the features of our admin2 layer in one go, this would kill our server. The maximum number of features that you can fetch in one go is: 10000. see MaxRecordCount in: https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_adm2_a_unhcr/MapServer

So, to get the next 10000 features, you need to add the query parameter: resultOffset=10000 to your GET request. And so on to get the rest. But this is not ideal if you are using Leaflet.

In Leaflet do you need the layer as a vector format with its attribute data that you can query or a raster basemap would be enough?

bluefoxr commented 1 year ago

Hi @Edouard-Legoupil and @pvernier thanks very much for the helpful answers. I had found the query form but was missing the syntax of what to write in the "where" field, so now after your pointers it makes much more sense.

@pvernier what we are doing is generating choropleth maps using leaflet, through R, at the admin 2 level. So I basically need the borders of the admin 2 regions. The maps we are generating look something like this (here I manually downloaded the shape files for Guatemala):

image

We are developing a tool where users put in their data at admin2 level (for a single country of their choice) and one of the outputs will be a map like this. So in my code I need to be able to get, for any given country, the borders (I guess as a vector format) plus the identifier code for each region so I can match it with user data.

Hopefully that makes sense. I think using the query form as you show is a good approach, and perhaps I can narrow down the fields to just the ones we need to speed things up. Substituting the ISO code in the string is no problem so I can hopefully generate queries in my code based on that.

However if you have any further suggestions please let me know. Thanks again for your help.

bluefoxr commented 1 year ago

p.s. @Edouard-Legoupil I take note of the idea to simplify the geometry, thanks for that.

Edouard-Legoupil commented 1 year ago

Hello - I changed the name of the ticket - following up on this - if we want to get colleagues to test the app - we need to offer the map background for different countries... Why not - at least to start with a pre-built data file in the package with all the admin 2 geometries (simplified and maybe only in the Americas to start with.. !) and admin2 id (so that we can verify a good match when people upload the data file) - and then offer this within an initial dropdown -

bluefoxr commented 1 year ago

Hi @Edouard-Legoupil I can get to work on this next. I think I would try to go directly for the API though, as it doesn't look too complicated now I understand the API a bit better. In either case once I start working on it a bit more closely I think it will be clear what the best approach is.

Edouard-Legoupil commented 1 year ago

@bluefoxr @pvernier -- What about setting up a function that will create some built-in package data - (to be included in the data folder - with one file per country - that would pull the right data from the ArcGIS server - then only keep the relevant variables - aka adminunit2, adminunitname & geometry - perform some simplification to get a light data - and save it within the package...)

Then the interface would start with a "select your country" so that the right map background could be loaded in order to map the data uploaded from the excel file..

Thinking out loud..

martinepunhcr commented 1 year ago

If the files are not too heavy, I believe this is a very reasonable solution. I would start first with countries in the Americas region only. Then we can explore if we need to do the same for the rest of the world (keeping light shapefiles within the package) or if there is a way of pulling the data directly from UNHCR's Geoservices API.

bluefoxr commented 1 year ago

The API has the possibility to directly select which fields to return. The idea of storing the shape files is definitely worth exploring but depends on the size and also whether we get much speed gain from it. Potentially, when a uploads data from a certain country, the shape files could be downloaded from the API and then saved so that next time they are available without using the API. So we only download the shape files that people are actually using.

This raises a couple of questions @Edouard-Legoupil :

bluefoxr commented 1 year ago

After some experiments with the map server I think we will definitely need to store the simplified maps. Depending on the source the maps can be very heavy, plus the server is prone to crash or hang. After simplification the shape files are quite light, so this approach seems viable. We can discuss in the meeting.

bluefoxr commented 1 year ago

Hi @Edouard-Legoupil @martinepunhcr @pvernier just an update on the mapping and some questions. Apologies in advance for the slightly long message!

So far

As you recall the objective here was to download geojson files for Admin2 regions for various countries. Due to the size of the files and our use-case, we decided to simplify the geometry which results in a much smaller file, which we can store on the server with our Shiny app, rather than having to query the API each time a user uploads their data.

To do this I wrote a function which composes the query for a given ISO3 code - see: https://github.com/UNHCR-Guatemala/A2SIT/blob/55a915c1b10d1bf0bc231a2b119a690051464730/R/f_results.R#L342

It then downloads the geojson file, and uses sf::st_simplify() to simplify the geometry. Then saves the file.

This has worked for most countries and we end up with small and manageable geometry files which is great. However for some countries I have run into problems and would need some advice.

Problem 1: can't access some countries

Of the list of countries you passed to me @Edouard-Legoupil there are four that I can't access the maps from. These are "BRA" "CHL" "COL" "MEX". I get an error that the file doesn't seem to exist. Also, when I try using the query form at https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_adm2_a_unhcr/MapServer I get this error:

image

Strangely, the records do seem to exist because it I query to return just the IDs then it works. So, how can I access the geometry for these countries?

Problem 2: can't simplify for some countries

For some countries, e.g. "PAN", "PER", I can download the geometry but can't simplify. Running sf::st_simplify() returns this error:

Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented,  : 
  Loop 0 is not valid: Edge 2 has duplicate vertex with edge 3041

If we don't simplify these files then they are around 2-15 MB, which is not huge but just slows down the rendering.

Problem 3: which ID to use

As mentioned yesterday the format of the codes for each region can be a bit inconsistent. The fields I am currently returning are "pcode", "adm2_source_code", "gis_name" and "geometry" . I notice that "pcode" seems to have the same format in all files. The "adm2_source_code" changes: it can be numerical, like "1207" or numerical with what looks like and alpha-2 code e.g. "PE0101". For some countries like Uruguay the codes change length, like "1", "10" "100", etc.

Also with Uruguay the regions are missing names - here is a sample:

image

As far as I can tell, the pcode column is consistently available throughout, but the fact that names are missing is a concern because user would probably like to know what the names of the regions are.

Questions/requests

So in summary the questions are:

  1. How can we get the geometry for countries where the server returns an error?
  2. How can we simplify geometry which throws an error?
  3. Which ID is the best to use for each region - it should be one that is recognised by users and consistent across all the geojson files pulled from the API.
  4. How to deal with missing names?

@Edouard-Legoupil I wonder if you have anyone available with more expertise in GIS who could help with these issues so that I can make progress in other areas of the app. In the meantime, at least we have around 10 countries with simplified geometry which can be used for testing (see https://github.com/UNHCR-Guatemala/A2SIT/tree/11-as-user-i-would-like-to-select-the-country-for-which-my-severity-index-needs-to-be-build-so-that-i-can-get-the-right-map/inst/geom). FYI I still need to do some more work on this branch to hook up the maps properly. I have also altered the template.

Thanks very much and sorry again for the lengthy message!

bluefoxr commented 1 year ago

@Edouard-Legoupil some additions: please see the latest commits on this branch which should now enable the most of the functionality to select different countries. Probably the highest priority (to get this to a point where we can distribute for testing) is to decide the format of the ID code for each region so I can build that into the code (i.e. problem 3 from above). Thanks.

pvernier commented 1 year ago

Hi @bluefoxr,

To answer your questions:

  1. Can you send me the requests that return an error for me to have a closer look?
  2. We certainly have geometry issues for those cases which might be the cause of your errors. What do you use to simplify the geometry?
  3. Use the column pcode. It's a unique ID for each record. But note that our admin1 and admin2 layers are versioned so they also contain polygons that are not valid anymore. To get only the valid ones, filter on the column gis_status. Value '14' is for valid, '13' for invalid. The first 2 digits of the pcode indicate the version number (e.g 20, 21, etc...).
  4. I was not aware that some records had missing names. In that case we will have to correct that on our side.
bluefoxr commented 1 year ago

Hi @pvernier thanks for your answers.

An example of queries that I run that I can't get a result for are:

The query URL from that is as follows:

https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_adm2_a_unhcr/MapServer/0/query?where=ISO3+%3D+%27BRA%27&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&havingClause=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&historicMoment=&returnDistinctValues=false&resultOffset=&resultRecordCount=&returnExtentOnly=false&datumTransformation=&parameterValues=&rangeValues=&quantizationParameters=&featureEncoding=esriDefault&f=geojson

From my testing, this happens for "BRA" "CHL" "COL" "MEX", but not for the other ISO3 codes I tested. E.g. If I take the query URL above and just change BRA to GTM, it works fine. So it seems to be something to do with the country?

For the second point on geometry, I am doing this in R using the "sf" package (the sf::st_simplify() function). Tbh I am not sure how this works as I am not very experienced in GIS, so it is hard to understand what the solution could be (if any).

Thanks for your hint on the use of pcode and the versioning. @Edouard-Legoupil please let me know what the most recognisable format to pass to users is (i.e. the most familiar code format for admin 2 statistics, if there is one).

Edouard-Legoupil commented 1 year ago

@pvernier - sorry I realize that I closed the issue while it is not fixed - can you kindly help

Edouard-Legoupil commented 1 year ago

@pvernier - sorry I realize that I closed the issue while it is not fixed - can you kindly help

So the 3 issue remains-

  1. some countries not available
  2. pcode sometimes not unique
  3. geometry has issues that prevent the simplification

@pvernier - what are the fallback options while you are fixing the issues in the admin 2 layers? -

I guess the alternative would be to simply use the default GADM extract here https://geodata.ucdavis.edu/gadm/gadm4.1/json/ --

Please let us know! Thanks Edouard

pvernier commented 1 year ago

@Edouard-Legoupil , @bluefoxr

  1. Regarding the first issue, with some countries ("BRA","CHL","COL","MEX") that can't be fetched via the API, I managed to get the data except for Brazil (but the requests sometimes timeout). For Brazil, as it is, I don't think there is a way to get the data in one request (there are too many features). A workaround could be to do it with multiple requests and then concatenate all the files into one. But finding the filters for the different requests might not be that easy.

  2. "pcode sometimes not unique". @bluefoxr mentioned that in Uruguay the names were missing but I was not aware of duplicate pcodes. Can you tell me where so that I can have a look.

  3. That will be complicated for us to fix in the short term. An alternative would be for you to check/correct the geometry before simplifying it but that might be complicated to do. Or instead of simplifying the geometry convert the GeoJSON to TopoJSON which greatly reduces the file size.

Edouard-Legoupil commented 1 year ago

@pvernier --- Maybe it would be easier to provide a second geojson web-service for admin 2 polygons that would be simplified directly from the ArgGIS server ? Would that work?

Happy to write a small data quality script to check the countries that have issues - either on geometry or pcode - will revert soon!

Edouard-Legoupil commented 1 year ago

@pvernier here's the function https://edouard-legoupil.github.io/admin2check/reference/f_get_admin2_boundaries.html

Following up from this - would be to good to get the correct API call so that:

I can update the function with the corrected API call - asap

Broader than that we could per default a series of additional indicator to be compiled automatically from public source available at sub-national scale - as already explored here: https://unhcr-americas.github.io/Area_Based_Approach/ -- Maybe something we could collaborate on?

@bluefoxr here are the files https://github.com/Edouard-Legoupil/admin2check/tree/master/inst/geom If we can get the additional default indicator, this could prefill the template automatically allowing for a default severity index exploration for all countries...

pvernier commented 1 year ago

@Edouard-Legoupil

Edouard-Legoupil commented 1 year ago

@pvernier -

I have now pulled all amin level 1 to join the admin1 name -

For the simplification, it does not seems to work - see - the issue remains for Brazil - https://edouard-legoupil.github.io/admin2check/reference/f_get_admin2_boundaries.html

BRA2 <- f_get_admin2_boundaries( ISO3 = "BRA", level = 2, simple = 1, simplify = TRUE, dTolerance = 500)

I can get level 1 but not level 2 - even when applying simplification on the server side..

There are also some countries whee the sf simplify is blocked because of quality issue in the geometry - see - ECU, PER, DOM, PAN, CUB...

I fear there's no other solution than fixing it on the ArcGIS side..

Thanks!