UNDP-Data / geohub

GeoHub Frontend Application
https://geohub.data.undp.org
BSD 3-Clause "New" or "Revised" License
39 stars 6 forks source link

Replace zonal stats for new datasets #3785

Closed JinIgarashi closed 2 weeks ago

JinIgarashi commented 1 month ago

follow steps of Generate admin level zonal stats at https://github.com/UNDP-Data/geo-cellular-automata/blob/main/hreaibm.md

Upload PMTiles to hrea container of blobstorage

https://github.com/UNDP-Data/geohub/blob/743d8716107ec82b6f3e03e9cf172f65f817bfb7/sites/geohub/src/routes/(map)/dashboards/electricity/utils/adminLayer.ts#L128-L151

Currently using static pbf tiles. we might need to change quite a lot of codes by switching static pbf to PMTiles. also need to check if there is any changes on column names.

In the line chart component, some code uses data from admin. maybe this code needs to be modified.

https://github.com/UNDP-Data/geohub/blob/b3edab4cc0f9b5c91a5fee201b4b39bb455f1648/sites/geohub/src/routes/(map)/dashboards/electricity/components/Charts.svelte#L146-L180

JinIgarashi commented 1 month ago

Let me update what we ended up with zonal stats work since you left. Joseph was trying to generate zonal stats by using your approach described in github repository (https://github.com/UNDP-Data/geo-cellular-automata/blob/main/hreaibm.md).

This approach is trying to estimate electricity access rate by using the number of pixels which has more than 80% electricity access rate. The percentage is calculated from dividing the number of pixels with electricity by the total number of pixels. However, we discovered this approach does not give us better figures. We think population data each pixel level is needed to estimate better access rate. that is what old admin data did to calculate electricity access rate from population. But we don't have population data for forecast one (2021-2030). we need to find different method to estimate future population and the percentage of electrified.

In conclusion, it is not feasible to include new zonal stats by this week (even by end of next week). We can deploy new electricity dashboard tomorrow without new zonal stats (we keep existing zonal stats as it is). In the future, we will find time to create zonal stats for 2021-2030.

Additionally, I want to let you know there is a new population dataset managed by OCHA. It is different from world pop. It has global vector population data from 2020 to 2023. As you can see preview of the below URL, this data has hexagon polygons where there is settlement, and each polygon has the number of population. We maybe can intersect this population data with our 1km aggregated raster to calculate the population each pixel. For the population after 2024, we may use this data to estimate population growth each polygon (maybe using 2022 and 2023 data) for future population. Then create global population datasets from 2024 to 2030 to get electricity access rate.

https://data.humdata.org/dataset/kontur-population-dataset?

JinIgarashi commented 1 month ago

Proposed steps to regenerate zonal stats for forcast electricity access (2021-2030) are:

1. Compute population with/without electricity access from 2021 to 2023

Kontur population data is available from 2021 to 2023.

each year's population data has hexagon polygons. we computer the no of population with electricity access, no of population without electricity access for each polygon. and add to pop_hrea_2021 and pop_no_hrea_2021 to the population geopackage in 2021 (repeat same process until 2023).

2. Download current admin data from 0 to 4

Current admin data is stored at the following blob storage.

These geopackages store columns for zonal stats from 2012 to 2020 like below:

To minimise our work, we reuse this existing admin data with stats as much as possible. In addition, maybe we don't need for admin3 and 4. computing data for admin 3 and 4 level may take much longer time.

3. Merge 2021 - 2023 data to current admin data

In step 1, we added population data to kontur geopackage. Now we can use exactextract to compute electricity access rate for each admin polygon by using step 1 result.

exactextract is described at hreaibm.md.

after this step, hrea_2021 to hrea_2023, pop_hrea_2021 to pop_hrea_2023 and pop_no_hrea_2021 to pop_no_hrea_2023 should be added to downloaded admin data (from 0 to 4)

4. creating zonal stats for future population (2024 - 2030)

There is no future population data from 2024. We need to do this by several steps:

4.1 estimate future population by using kontur population data

  1. Compare 2022 and 2023 population data, to estimate population growth ratio for each hexagon polygon in geopackage. Add population growth column to geopackage.

This method may be used for forecasting population.

https://www.researchgate.net/publication/227129930_Projecting_a_Gridded_Population_of_the_World_Using_Ratio_Methods_of_Trend_Extrapolation

  1. Use this population growth to simply estimate future poplulation from 2024 to 2030 for each grid. (add columns of future population like pop_2024 to pop_2030 to geopackage.

4.2 Computer electricity access population from 2024 to 2030

Using the following forecast data by combining estimated population, compute no of population with electricity access, no of population without electricity access for each polygon.

4.3 Merge 2024 - 2030 data to current admin data

do the same process of step 3 to merge 2024 to 2030 data to current admin data.

5. Convert gpkg to fgb by GDAL/OGR

to do from admin 0 to admin 4

6. Convert fgb to PMTiles by using tippecanoe

to do from admin 0 to admin 4

7. Upload fgb, gpkg and pmtiles to blob storage

to do from admin 0 to admin 4

we can upload all files to https://undpgeohub.blob.core.windows.net/hrea container.

8. Replace admin data to new one in sveltekit.

If all columns name follow existing admin data, we can minimize changes on frontend.

JinIgarashi commented 1 month ago

zonal stats for kontur population datasets (output of step 1 above) are:

the below 2020 data is for comparison with existing 2020 data

JinIgarashi commented 1 month ago

zonal stats approarch for vectors by geopandas

https://gis.stackexchange.com/questions/436463/weighted-average-values-of-overlapping-polygons-in-python

JinIgarashi commented 1 month ago

population forecast algorithm is in these papers. maybe one of them can be used

JinIgarashi commented 1 month ago

Step 1 - 3 are fixed by #3850

iferencik commented 2 weeks ago

I have uploaded new tiles with identical structure at https://undpgeohub.blob.core.windows.net/hrea/admin/forecast. Additionally the fgb files are also uploaded