Open geohacker opened 2 years ago
The issue with FEWS is that it doesn't contain local names. E.g. in case of Ukraine no cyrillic names are available, only the English transliterations. The other problem is the coverage. It has a good covergae in Africa, but not on other continents. A list of admin2 layers per countries is shared here
At the GO Sprint in Kathmandu we decided we'll go ahead with the OCHA CODs for admin2 that are published on geoboundaries.org. Since we will rely on CODs, it will allow us to import progressively without changing the data drastically quickly. We decided to start with an inspection of the data and how that lines up with the existing admin0 and admin1 data in GO. We also decided to consider importing countries in the Caribbean to start with.
I started looking at OCHA COD admin2 data for importing to GO. Here are some findings for Haiti and Kosovo:
The admin boundaries almost always don’t line up well. This means we’ll have to import admin1 and admin2 from the same data source.
Some small areas are missing from the OCHA admin1 polygons compared to what’s already in GO from ICRC
For Kosovo, looking at admin0, there are some shifts in the boundary
Looks like Kosovo admin2 is actually what we use as admin1 in the GO database. But then this is not part of the OCHA COD.
The issues illustrated above are not particularly surprising but something we needed to take a look at with good examples. This makes me feel like I think we should work towards an expectation of getting reliable admin2 boundaries into the database, without removing the admin1 and admin0 data that came from ICRC. Some thoughts:
For the GO API and Risk Module use cases, I think we can do the following:
cc @batpad @tovari @LukeCaley @justinginnetti
Thanks for the productive discussion today @tovari @LukeCaley @justinginnetti @batpad. We are in agreement to move forward with the above approach — we won't replace all admin1s but only in cases were it's absolutely necessary due to reasons like:
In terms of next steps:
Over the next couple days, I'll update this ticket with progress.
I'm continuing this work in #1557 PR.
This lines up pretty well with admin1 data that's already in GO. So we don't need to replace that
Now to get the admin1 ID from GO into the Haiti admin2 shapefile, this is my workflow:
Step 1: Open the Haiti admin2 shapefile from geoboundaries (COD via OCHA) in QGIS
Step 2: Open the admin1 layer from GO by connecting the GO database locally with QGIS (could also be done with a remote staging database)
Step 3: Create a centroid layer of the polygon layer using Vector > Geometry Tools > Centroids
Step 4: Use attribute join functionality to join admin1_id to the centroid layer
district_id
as a new attributeThere are CODs available for Colombia. This is the workflow I used. The goal is to have an admin2 shapefile for Colombia that has the following attributes shapeName
, pcode
, admin1_id
(which needs to derived like above from the GO admin1 data).
Looks all good in terms of territories but some minor issues likely due to different geometry simplifications. So we don't need to change the admin1 data.
Create centroids Centroids won't work really well for this matching due geometries like below
For this admin2 polygon, the centroid is actually outside the geometry. One could use geometric center instead of centroid but it might be better to prepare random points inside the geometry for the matching.
Create random points inside polygon
Set number of points as 1 in the dialog and create a new temporary layer.
Join the random points layer with admin1 layer to add district_id Follow steps outlined previously by using the Join Attributes by Location option. In the new joined random points layer, inspect the attribute table.
Check if there are any NULL values by clicking the district_id
column to sort it. In this case we can see there are two NULLs. Meaning for two admin2s we couldn't find an admin1 match. To inspect why that is, select the row and then click on 'Zoom map to selected rows'
Now we can see that the point wasn't able get a match because it's sitting outside the admin1 boundary because of the minor geometry issue. In this case, it's easier to look up the admin1 geom and then edit the id column manually.
The ID is 642. To update, follow the steps below.
Now join this random points layer with the admin2 polygon layer using the join attributes by location tool. In the end, it's important to make sure all the join layers have the same feature count
Finally, rename district_id
to admin1_id
and save as shapefile.
I thought I'd look at the Ukraine admin2 that are getting a lot of movement on the HDX page https://data.humdata.org/dataset/cod-ab-ukr — the data i'm looking at is updated on October 11, 2022 ukr_adm_sspe_20221005.zipSHP
All good. Some minor polygon simplification issues but we can stick to our existing admin1 data.
admin2 also looks good.
The column names are different so we have to make sure to rename.
I followed the same steps as above
Checking an admin2 in the GO Admin
Same workflow as above for Venezuela
@geohacker, would you mind to list the mandatory fields with types of the admin2 geo files? Should it be a shp, geojson, or something else?
@tovari sure! Currently we support only shapefiles with mandatory fields name
— name of the admin2 (or shapeName
as in CODs), code
— pcode (orpcode
as in CODs), and admin1_id
— which is the admin 1 ID from the GO database.
Thanks @geohacker! What optional fields can be added? I'm think about e.g. local_name
and LN_lang_code
, alternate_name
and AN_lang_code
.
I'm not sure, if it makes sense to add an option for local admin ID, and for population data.
At the moment, we don't have any other fields https://github.com/IFRCGo/go-api/blob/develop/api/models.py#L265-L272 — but we can certainly add to account for names in other languages. But that we should be consistent with how we are doing languages for admin1 and regions, with columns called name
, name_es
, name_ar
, name_fr
, name_en
.
I think we should not store population data in the admin2 table. Because it needs to be updated more regularly perhaps. Ideally that data should live in a different table with pcode mapping so we don't have to worry about updating the geometries when we need to update population data. Only if there's an immediate use case.
Ok, agree on not including the population data.
I think, names on local language have an importance on lower admin levels as mostly there won't be en, es, fr, ar versions of the names. There might be transliterations to latin from other alphabets, but I think we should still preserve the local names written in the local alphabet. Alternate name and alphabet might be relevant as well in multi language countries. Thus we will have an option to store 2 versions of the names in 2 languages.
name
is the transliterated name to latin in this case, I assume.
@tovari ok makes sense! I've just added local_name, local_name_code, alternate_name and alternate_name_code as optional fields. The import script will also look for the presence of these columns in the shapefile and import accordingly.
Just to note that the OCHA cod shapefiles we are importing do not have local name fields so currently all of them only have the default name
field.
The PR #1557 is now ready for review. So far, we have prepared and imported (locally):
Once the PR is merged, we can import these on staging to test. cc @batpad
A workflow to import admin2 is now merged to develop. This also includes methods to create, update and publish mapbox tilesets. At the moment, there's a sample mapbox map style with some admin2.
The process is documented in the README
I did the admin1-2 matching a bit differently to make sure we link admin2s to the correct admin1 even when there are significant deviations between OCHA and ICRC admin1s.:
Check:
Create random points inside admin2 polygons
spatial join that points with admin1 to add district_id (another one) to the random points
check if that another district_id and the district_id from the transition process match. In case they don't, the admin2 center inside point is outside of the admin1 which should cover the admin2.
List these admin2, with significant discrepancies, inspect the polygon borders
Export admin2 to shapefile
The check method may not find all discrepancies, but it finds them with a good chance when a good part of the admin2 is out of ICRC admin1.
One sample of the detected discrepancy: Admin1 update should follow in such cases.
cc: @geohacker, David, @jhenshall
May 2024 - in "Ticket time" doc @davidmuchatiza advises this is still very much relevant and in progress
After importing admin1 data and building a workflow to update geometries and attributes, and update Mapbox Vector tilesets we will now look at doing pretty much the same for admin2. https://github.com/IFRCGo/go-api/issues/470
Workflow
The workflow will remain largely same. We'll write management commands that can read a shapefile to create new admin2 geometries or update existing ones based on a new dataset. This ensures there's an easy way to fix incorrect or disputed geoms. The geometries will be stored in a separate table (similar to admin1) and not in the districts table. This means that it won't impact the performance of existing GO API endpoints.
We'll also add a query param for the API to fetch geometries and also write a script to update Mapbox tiles when needed.
Base data source
We think that the admin boundaries from FEWS is a good baseline. FEWS is a good dataset that is best of FAO-GAUL, GADM, the Humanitarian Data Exchange. HDX uses the UN OCHA datasets. FEWS also incorporates standard names from the GEONet Names database.
It's not perfect but with the workflow to be able to update easily, we should be able to fix issues as they are reported. We have had some good experience using FEWS in a few different projects.
@tovari In our dev catchup call the other day, you mentioned a few cases where FEWS wasn't reliable. Do you mind outlining them here? We can probably catch these early on and look for alternatives for those countries.
cc @batpad @LukeCaley @frozenhelium @szabozoltan69