OCHA-DAP / hdx-signals

HDX Signals
https://un-ocha-centre-for-humanitarian.gitbook.io/hdx-signals/
GNU General Public License v3.0
5 stars 0 forks source link

intentionally downloading and merging lowest level admin to create admin 0? #114

Closed zackarno closed 1 month ago

zackarno commented 3 months ago

You can see that the download_adm0_sf() function does not actually download the admin 0 when layer= NULL (default). Rather it downloads all the admin files available and just selects the first one. So for example if you run

download_adm0_sf(iso="BGD,file ="adm0")

it actually returns admin 4 level shapefile with > 5000 rows

https://github.com/OCHA-DAP/hdx-signals/blob/ab74f4a9181b71121fd01f3a5305f7dc9eb0188e/src-static/update_iso3_sf.R#L151-L187

You can see later in the implementation of update_adm_sf .. layer is not specified either so it's downloading all the admins

https://github.com/OCHA-DAP/hdx-signals/blob/ab74f4a9181b71121fd01f3a5305f7dc9eb0188e/src-static/update_iso3_sf.R#L41-L43

The update_adm_sf() func does sort of convert them back to adm0 with this step:

https://github.com/OCHA-DAP/hdx-signals/blob/ab74f4a9181b71121fd01f3a5305f7dc9eb0188e/src-static/update_iso3_sf.R#L45-L48

but it seems like it would be cleaner just to download the admin 0 file as the download_adm0_sf() function name indicates. It might be possible that this unioning step is also resulting in some of the weirder looking shapes?

Is the issue that the admin names are not standard enough? they look pretty standard just testing a few with fieldmaps so that you could string match the layer name on "adm\d$" or supply it dynamically with glue("{tolower(iso3)}_adm0"). IF not standard enough you could just get the adm0 like this by grabbing the layer with the least features like this:

sf$st_layers(fn)$name[sf$st_layers(fn)$features == min(sf$st_layers(fn)$features)]
caldwellst commented 3 months ago

Nice. I like this a lot, please implement, and yes, best to grab that adm0 layer!

zackarno commented 3 months ago

do you happen to know if all FieldMap layer names are standardized to glue("{tolower(iso3)}_adm\\d$"). If thats the case it's easy to supply as layer to st_read() inside download_shapefile(). If there not all standardized like this -- cant really use the download_shapefile() function here with the approach below to choose the layer with the least features (adm0) as the code will make it less generic and only good for admin 0

adm0_lyr_name <- sf$st_layers(fn)$name[sf$st_layers(fn)$features == min(sf$st_layers(fn)$features)]

Kind of annoying to check because you need to download ever gpkg.zip first, unzip, check layers etc -- so was wondering if you happened to know from previous work.

update :

caldwellst commented 3 months ago

Sorry, I don't know actually if they are standardized, but glad to hear that they seemingly are in your testing. Let's see how we get on and how many timeout. Could be something caught in the tryCatch so we can see what's happening. Will check out when I review the final code either here or in a PR!

zackarno commented 3 months ago

yup, they are standard which makes it easier to simply call them in the layer argument with glue("{tolower(iso3)}_adm0").

I think when the simplest thing will just be to combine this functionality to pull adm0 layers only (rather than all subnational) with the data source attribution PR #115 . That way we will end up with all our admin 0 boundaries attributed to data source. Once this is done we can compare the field map attributed files to the html table on the field map website to see what if any timed out.

caldwellst commented 2 months ago

@zackarno Can we close this?

zackarno commented 2 months ago

yes - I believe so. Could run the actual file update command first and make sure all good.