dickoa / rhdx

R package to interact with the Humanitarian Data Exchange portal - http://dickoa.gitlab.io/rhdx/
Other
26 stars 6 forks source link

download admin area cod (common operational dataset) for any country #5

Open andysouth opened 4 years ago

andysouth commented 4 years ago

I want to write some code allowing users to download the admin area COD for any country and admin level.

I'm coming up against a few inconsistencies in hdx tags and format that make this tricky.

I've put all this in one issue for now, in case there is a better way that you can point me to. I can break these up into individual issues if that helps.

What I'm trying to do.

  1. write a query that returns a single dataset for the admin area COD
  2. identify the shapefile resource (assuming that is commonest)
  3. get list of layers
  4. identify and download layer for a specified admin level

Current issues (for examples see in the code below) :

  1. I'm struggling to get a query that reliably returns just the one dataset
  2. shapefiles are sometimes tagged as 'zipped shapefile' sometimes 'zipped shapefiles'
  3. sometimes the zipfile contains a subfolder that stops it being opened by sf

Thanks.

    iso3clow <- 'nga'
    #iso3clow <- 'mli'
    level <- 2

    #nigeria does return single result
    #mali returns two datasets first one is population
    querytext <- paste0('vocab_Topics:("common operational dataset - cod" AND "gazetteer" NOT "baseline population") AND groups:', iso3clow)

    rhdx::set_rhdx_config()
    datasets_list <- rhdx::search_datasets(fq = querytext)

    #query needs to return a single dataset (with multiple resources)
    ds <- datasets_list[[1]]

    #get list of resources
    list_of_rs <- rhdx::get_resources(ds)
    list_of_rs

    #selecting resource
    #nigeria "zipped shapefiles"
    #mali "zipped shapefile"
    ds_id <- which( rhdx::get_formats(ds) %in% c("zipped shapefiles","zipped shapefile"))

    rs <- rhdx::get_resource(ds, ds_id)

    # find which layers in file
    mlayers <- rhdx::get_resource_layers(rs, download_folder=getwd())

    #error for nigeria
    #<HDX Resource> aa69f07b-ed8e-456a-9233-b20674730be6
    #Name: nga_adm_osgof_20190417_SHP.zip
    #Format: ZIPPED SHAPEFILES
    #Error: This (spatial) data format is not yet supported
    #in hdx resources.r
    # supported_geo_format <- c("geojson", "zipped shapefile", "zipped geodatabase",
    #                           "zipped geopackage", "kmz", "zipped kml")
    #added "zipped shapefiles" option to supported_geo_format in my local branch of rhdx
    #now I get
    #Cannot open data source /vsizip/C:/rsprojects/afriadmin/nga_adm_osgof_20190417_shp.zip
    #Error in CPL_get_layers(dsn, options, do_count) : Open failed.
    #can I open a layer from the downloaded file directly ?
    #using default should open the first layer
    sflayer <- rhdx::read_resource(rs, download_folder=getwd())
    plot(sf::st_geometry(sflayer))
    #no this also fails
    #seemingly because there is a subfolder within the zip
    #aha, nigeria is in a folder within the zip and mali isn't so nigeria fails and mali works
    #is there a way of detecting and dealing with this ?

    # later read layer using layername
    # this relies on all country layers having adm* in their names    
    layername <- mlayers$name[ grep(paste0("adm",level),mlayers$name) ]

    sflayer <- read_resource(re, layer=layername, download_folder=getwd())

    #test plotting
    plot(sf::st_geometry(sflayer)) 
dickoa commented 4 years ago

Hi Andy,

I also think that having a way to quickly get the admin-AB COD of each country would be awesome. I'm no longer working with OCHA but I can still ask them to fix some issues like wrong file formats name, zip files with sub-folder but also how to get a list of all COD. I think they maintain a list of all admin-AB COD.

I will come back to you ASAP. Thanks again

andysouth commented 4 years ago

Thankyou Ahmadou, I'm also happy to correspond with OCHA people if that helps.

dickoa commented 4 years ago

I will also explore other options using CKAN facets in parallel to solve this issue. You can contact the OCHA FIS team at ocha-fis-data at un dot org (you'll probably have Tom Haythornthwaite who's in charge of vetting all COD-AB). Let me know how it goes Thanks

hayttom commented 4 years ago

Andy, I think I now understand your approach a bit better than I did when I just emailed you about live services; I do see now how you want to read from HDX systematically and that needn't involve the live services. Sorry.

Regarding the situation you discovered where the geodatabases are not arranged uniformly, that is down to me as I prepare them and I suppose I should standardize them. Which would be the preferred arrangement? Could your systems report which do not conform?

SimonbJohnson commented 4 years ago

@AndySouth - does the source have to be HDX? There is a smaller subset (about 60 countries) on the COD ITOS service.

My example python code to download as geojson. https://github.com/simonbjohnson/cod_topo

I also convert them to topojson and standardise the attribute names

andysouth commented 4 years ago

Thanks @hayttom ! To consume from R at step 3 above it is easier if the shapefiles are not in a subfolder in the zip (i.e. following current Mali rather than Nigeria. For step 1, it would be good to have a tag(s) that only occurs once for each country to indicate the COD admin boundaries (it is close currently but some examples e.g. Mali return more than one record). It may be that the query above could be improved to avoid that. Yes, I can run some code to report on which countries conform.

andysouth commented 4 years ago

Thanks @SimonbJohnson The source doesn't have to be HDX but we are aiming to cover as many countries as possible, which is why I started with HDX. I'll have a look into your live service code.

hayttom commented 4 years ago

Hi Andy,

I guess we have inconsistencies in the HDX zip file arrangement for shapefiles and also for geodatabases - gulp. Fortunately this is something I can work on without disrupting the dataset URLs or even the HDX resources - just by fixing the content. I'm adding it to the checklist I'm following for my own internal audit.

I'm not sure I follow about the hope for a unique COD admin boundary instances - the COD tag is not just for admin boundaries and so Mali legitimately has eight COD datasets. Are we on the same page? This not withstanding, we do encounter some cases of a country having more than one admin boundaries COD but it's not supposed to happen. The problem happens when some of my colleagues in country or regional offices get too enthusiastic.

andysouth commented 4 years ago

Hi Tom, Good to hear the structure is fixable without too much disruption.

For the tags I'd like to be able to query to just get the admin boundaries

So far with my query below

nigeria does return single result

mali returns two datasets first one is population

querytext <- paste0('vocab_Topics:("common operational dataset - cod" AND "gazetteer" NOT "baseline population") AND groups:', iso3clow)

Is there anything I can add to the Mali query to exclude the first record below ? Or alternatively if it did have a tag for 'baseline population' that would work too.

[[1]]

ce21c7db-d8f0-40f8-adc2-452d2d2d105c Title: Mali administrative level 0-3 population statistics Name: population-projection-2018-of-mali-admin-levels-3-disaggregated-by-sex Date: 03/07/2018 Tags (up to 5): common operational dataset - cod, gazetteer Locations (up to 5): mli Resources (up to 5): Mali_Population_communes_sexe_2018.xls, mli_pop_adm0.csv, mli_pop_adm1.csv, mli_pop_adm2.csv, mli_pop_adm3.csv [[2]] d2ec62bb-5a93-436d-8297-88b3ee9b6818 Title: Mali administrative level 0-3 boundaries Name: administrative-boundaries-cod-mli Date: 06/01/2015 Tags (up to 5): common operational dataset - cod, gazetteer, geodata Locations (up to 5): mli Resources (up to 5): MLI COD-AB 2019_08_07.pdf, MLI_AdminBoundaries_TabularData.xlsx, mli_adm_1m_dnct_2019_SHP.zip, mli_adm_1m_dnct_2019_EMF.zip, mli_adm_1m_dnct_2019_KMZ.zip
hayttom commented 4 years ago

Hi Andy,

Regarding "To consume from R at step 3 above it is easier if the shapefiles are not in a subfolder in the zip (i.e. following current Mali rather than Nigeria" I have re-arranged Nigeria, updated our SOP, and will work through the rest of our CODs, starting with the Dark Continent.

Sincerely, Tom

dickoa commented 4 years ago

Thanks a lot @hayttom for this.

andysouth commented 4 years ago

Many thanks @hayttom,

I just tried downloading the Nigeria shapefile from rhdx and directly from the HDX website and it doesn't seem to be working yet. This was the error message when I clicked the download button on the HDX website.

image

SimonbJohnson commented 4 years ago

@AndySouth - I do recommend seeing whether the ITOS service matches your requirements, as it will save you a lot of time.

Full list of supported files: https://github.com/SimonbJohnson/cod_topo/blob/master/itos_service.csv

Example script to download all countries(Python): https://github.com/SimonbJohnson/cod_topo/blob/master/download.py

Script I used to standardise attributes (also converts the format I need): https://github.com/SimonbJohnson/cod_topo/blob/master/convert.py

Geojson library as a result: https://github.com/SimonbJohnson/cod_topo/tree/master/geoms/geojson

andysouth commented 4 years ago

Many thanks @SimonbJohnson That does sound useful. I'll look into over the next couple of days.

dickoa commented 4 years ago

I think @SimonbJohnson code can be packaged into a nice R data package to serve COD-AB. I can also have a quick look this weekend and start something. Thanks a lot @SimonbJohnson for this. @AndySouth I pushed a minor change to support directly "zipped shapefiles", you can read Nigeria data zipped shapefiles directly with it.

hayttom commented 4 years ago

@All, now that Ahmadou has made that fix and given my other tasks I won't be continuing to make the COD shapefile zipfile arrangements uniform, except in new cases or when other ad hoc adjustments are necessary. It's been a good learning lesson but our support must focus on the live services. Unfortunately all our 50 strategic counties (except Jordan) are now fulfilled so we do not expect to be expanding COD coverage in Africa.

andysouth commented 4 years ago

Thankyou @hayttom that's understandable.

@dickoa want to collaborate on creating the new package ? I'll have a look at over the weekend too. :-)

dickoa commented 4 years ago

@AndySouth I would love to collaborate on this. Thanks

andysouth commented 4 years ago

I started experimenting using the standardised Geojson boundaries created by @SimonbJohnson.

R code is temporarily here https://github.com/afrimapr/afriadmin/blob/master/R/hdxlive.r

Here is an atlas comparing the hdxlive boundaries for Africa to gadm (the former are likely to be more recent). https://rpubs.com/southmapr/579418