This code is designed to take data from the Climada API for the 23 Humanitarian Response Plan countries on HDX, aggregate it over subnational regions (admin1) where appropriate, export it to CSV and then publish it to HDX.
The data are all available under the ETH Zürich - Weather and Climate Risks organization on HDX: https://data.humdata.org/organization/eth-zurich-weather-and-climate-risks
The source data in the CLIMADA API can be explored using this browser: https://climada.ethz.ch/datasets/
The datasets published are lit population, crop_production, earthquake, flood, wildfire, tropical_cyclone and storm_europe.
These datasets were generated from the CLIMADA API which comes with the following disclaimer:
In this API we provide datasets in a form that can readily be used in CLIMADA analyses.
Users should determine whether these datasets are suitable for a particular purpose or application, considering factors such as resolution (for example, a 4km grid is not suitable for modelling risk at the neighborhood level), the way that hazards are represented in the dataset (for example, specific events, event thresholds, probabilistic event sets, etc.), the way that exposure is represented, and other aspects.
Data provided with no warranty of any kind under CC BY 4.0.
See respective API metadata and referenced publications for details and limitations.
Original at: https://climada.ethz.ch/disclaimer/
Each dataset will have a a "country" CSV format file for each country where data are available and a CSV format summary file. The country file contains gridded data of the indicator, typically on a 4km grid. The summary file contains data summarised over the "admin1" level - this corresponds to a province or state. Both types of file include HXL tags on the second row of the dataset. Both files have the same columns:
country_name,admin1_name,latitude,longitude,aggregation,indicator,value
#country,#adm1+name,#geo+lat,#geo+lon,,#indicator+name,#indicator+num
The exposures datasets (earthquake, flood, wildfire, tropical cyclone and storm europe) also have a CSV format timeseries summary file. Which provides a summary value for each "event" in a country at admin1 level or better. These files have the following columns and HXL tags:
country_name,admin1_name,admin2_name,latitude,longitude,aggregation,indicator,event_date,value
#country,#adm1+name,#adm2+name,#geo+lat,#geo+lon,,#indicator+name,#date,#indicator+num
Where possible timeseries data is provided at the admin2 aggregation level
The country files have aggregation none
and the summary files will have aggregation of either sum
or max
. The indicator
column may be a compound value such as crop_production.whe.noirr.USD
where an indicator calculation takes multiple values or it may be simple, such as litpop
.
The summary file has a row per country per admin1 region per indicator whilst the country file has a row per underlying latitude / longitude grid point per indicator.
Where the admin1_name
are as per the private UN dataset unmap-international-boundaries-geojson.
The data are updated using GitHub Actions on this repository which run monthly on consecutive days at the beginning of each month. The datasets are only updated on HDX if the date range found in the data from API changes - we anticipate that this will happen yearly.
The flood indicator cannot be processed in GitHub Actions because it exceeds memory/time constrains. A monthly job will be run to highlight the need to consider a manual update which will require the code in this repository to be installed locally, as described below.
Create a virtual environment (assuming Windows for the activate
command):
python -m venv venv
source venv/Scripts/activate
The Climada Python Library requires the GDAL library whose installation can be challenging. On the Windows machine used for development the GDAL library is installed from a precompiled binary found here: https://www.lfd.uci.edu/~gohlke/pythonlibs/
It is then installed with pip
:
pip install GDAL-3.4.3-cp310-cp310-win_amd64.whl
This repository can then be cloned and installed with
pip install -e .
The .from_shape_and_countries
method for litpop
(at least) requires the following file to be downloaded:
to
~\climada\data\gpw-v4-population-count-rev11_2020_30_sec_tif\gpw_v4_population_count_rev11_2020_30_sec.tif
This can be done using the hdx-climada
commandline tool, described below. This requires an account to be created on https://urs.earthdata.nasa.gov/users/new for the download and the username and password stored in the environment variables NASA_EARTHDATA_USERNAME
and NASA_EARTHDATA_PASSWORD
respectively. The command to download the data is then:
hdx-climada download --data_name="population"
In addition UNMAP boundaries need to be downloaded from HDX. These are private datasets, not publically available. An appropriate HDX_KEY needs to be provided in an environment variable HDX_KEY
for this download, and the upload of the completed datasets. The data can be downloaded using the hdx-climada
commandline tool, described below. They can only be downloaded programmatically from the prod
HDX site but can be downloaded manually from stage
or elsewhere.
hdx-climada download --data_name="boundaries" --hdx_site="prod"
We use nbstripout
to remove output cells from Jupyter Notebooks prior, this needs to be installed per repository with:
nbstripout --install
Note that this is not compatible with GitKraken.
The write_image
function in plotly
which is used to export figures to png format in Jupyter Notebook functions requires the kaleido
library which is rather difficult to install on Windows 10. Simply installing with pip
leads to a hang.
The solution is to downgrade to kaleido 0.1.0 and patch the library (file: Kaleido\scope\base.py - Line:70)! In the original kaleido.cmd
reads kaleido
.
@classmethod
def executable_path(cls):
vendored_executable_path = os.path.join(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
'executable',
'kaleido.cmd'
)
Described here: https://github.com/plotly/Kaleido/issues/110#issuecomment-1021672450
The dataset metadata are compiled in the file from the CLIMADA data-type endpoint:
\src\hdx_scraper_climada\metadata\2024-01-11-data-type-metadata.csv
But they are picked up by create_datasets
from
\src\hdx_scraper_climada\metadata\attributes.csv
This repository implements a commandline interface using the click
library, this is mainly concerned with the creation of the CLIMADA datasets for HDX but can also be used to download the datasets from HDX. It is accessed via the command hdx-climada
. Help is provided by invoking hdx-climada --help
:
Usage: hdx-climada [OPTIONS] COMMAND [ARGS]...
Tools for the CLIMADA datasets in HDX
Options:
--help Show this message and exit.
Commands:
create_dataset Create a dataset in HDX with CSV files
dataset_date Show dataset date ranges
download Download data assets required to build the datasets
info Show the data_type info from the CLIMADA interface
A dataset can be generated manually with a commandline like:
hdx-climada create_dataset --indicator=$CLIMADA_INDICATOR --hdx_site=$HDX_SITE --live
The bulk properties of the datasets, built in GitHub Actions with the exception of flood:
Indicator | Size/MB | Runtime | Date range |
---|---|---|---|
Lit population | 58 | 34 minutes | [2020-01-01T00:00:00 TO 2020-12-31T23:59:59] |
Crop production | 3.62 | 6 minutes | [2018-01-01T00:00:00 TO 2018-12-31T23:59:59] |
Earthquake | 58.5 | 2 hours 11 minutes | [1905-02-17T00:00:00 TO 2017-12-03T00:00:00] |
Flood | 239 | 12 hours | [2000-04-05T00:00:00 TO 2018-07-15T00:00:00] |
Wildfire | 44.9 | 31 minutes | [2001-01-01T00:00:00 TO 2020-01-01T00:00:00] |
Tropical-cyclone | 28.6 | 58 minutes | [1980-08-01T00:00:00 TO 2020-12-25T00:00:00] |
Storm-Europe | 11.5 | 2 hours 22 minutes | [1940-11-01T00:00:00 TO 2013-12-05T00:00:00] |
Runtime for crop-production is about 134 seconds and generates 3.54MB of CSV files. This is smaller than for Litpop because although it comprises 8 datasets they are intrinsically lower resolution and do not form a complete grid.
Runtime for Earthquake is about 10 minutes and generates 57MB of CSV files. Adding the time series summary increasing the time to generate data to about 3 hours.
The underlying data is historic records of earthquakes between 1905 and 2017. There are a little over 41,000 records. For each earthquake there is a map of the world with the intensity of shaking produced by the earthquake above a threshold of 4.0 units on the MMI - this will only have non-zero values over a relatively small area. The CLIMADA API provides a special function for showing the maximum intensity over all the earthquakes which is the data we present. This means there is a single value for each value on the map grid and we summarise over admin1 areas by taking the maximum intensity over the grid points in that area.
Also included is a time series summary which shows the maximum intensity for each earthquake in each admin1 area or admin2 if it is available.
Runtime for Flood is about 12 hours and generates 239MB of data.
The underlying data is a binary mask (values either 0 or 1.) on a 200mx200m grid for each flood event. For the detail view this is sparse grid is stripped of non-zero values reducing the grid size from approximately 1 million points to O(10000). For the summary views the number of non-zero grid points is summed to provide an aggregate value per admin1 or admin2 area (where available).
Admin2 geometries are only available for Ethiopia, Haiti and Somalia
Runtime for wildfire is about 30 minutes for 45MB
The underlying data is a fire intensity measured in Kelvin on a 4km grid, we retain this data for the country detail files but for both summary files we calculate a "fire extent" which is a count of the grid points for which there is a non-zero intensity.
Runtime for storm-europe is about 3 hours, generating 11MB. It is only run for Ukraine
The event date are supplied as a float which needs to be coerced to an int to convert to a date. Multiple events are recorded on each date, possibly representing hourly figures.
The Jupyter Notebook data_explorer_notebook.ipynb
is used to check datasets "manually". The Excel spreadsheet 2024-01-27-pivot-table-haiti-admin1-crop_production.xlsx
demonstrates the use of PivotTables to convert data from a "narrow" format where there is a single indicator column potentially containing multiple indicator values for different attribute selections in the same set (i.e. crop_production.whe.noirr.USD
and crop_production.soy.noirr.USD
)
The tests are a bit flakey. test_get_date_range_from_live_api
fails sometimes with a 503 error, test_calculate_indicator_for_admin1_litpop
and test_export_indicator_data_to_csv_litpop
sometimes fail because the NASA Black Marble data does not all download correctly.
The tests are slow, overall they are slower than they need to be because we test each dataset both in test_climada_interface and test_creat_csv_files are essentially the same. 196.76s call tests/test_climada_interface.py::test_get_date_range_from_live_api 112.79s call tests/test_climada_interface.py::test_flood_shim 79.38s call tests/test_create_csv_files.py::test_export_indicator_data_to_csv_tropical_cyclone 72.41s call tests/test_create_csv_files.py::test_export_indicator_data_to_csv_river_flood 60.60s call tests/test_climada_interface.py::test_calculate_indicator_for_admin1_river_flood 60.53s call tests/test_climada_interface.py::test_calculate_indicator_for_admin1_tropical_cyclone 36.37s call tests/test_create_csv_files.py::test_export_indicator_data_to_csv_earthquake 23.37s call tests/test_climada_interface.py::test_calculate_indicator_for_admin1_crop_production 18.21s call tests/test_create_csv_files.py::test_export_indicator_data_to_csv_flood 17.47s call tests/test_create_csv_files.py::test_export_indicator_data_to_csv_wildfire 13.33s call tests/test_download_admin_geometry.py::test_get_admin1_shapes_from_hdx_no_data_case 10.42s call tests/test_climada_interface.py::test_calculate_indicator_for_admin1_earthquake 8.73s setup tests/test_create_csv_files.py::test_create_dataframes 8.56s call tests/test_climada_interface.py::test_calculate_indicator_for_admin1_flood 7.33s call tests/test_climada_interface.py::test_calculate_indicator_for_admin1_wildfire 6.39s call tests/test_create_datasets.py::test_create_datasets_in_hdx 2.33s call tests/test_climada_interface.py::test_filter_dataframe_with_geometry 1.60s call tests/test_climada_interface.py::test_calculate_indicator_timeseries_admin 1.27s call tests/test_create_csv_files.py::test_export_indicator_data_to_csv_crop_production