A repository dedicated to developing a geospatial data science prototype (see issue: https://github.com/developmentseed/labs/issues/292).
To explore the use of machine learning techniques on publicly available, open-sourced datasets to demonstrate the potential to predict cholera in endemic regions of the world, which could be developed further as part of a public health planning and decision making tool for humanitarian organizations. Develop a PoC based only on open-source data to showcase ML capabilities in this space which could be developed further to support decision tool development in this space, and provide more context to cholera patterns than is provided by cases alone.
In cholera-endemic countries, there is support of environmental signatures between seasonal outbreaks which could be explored and used to develop a framework for an early warning system. See also The seasonality of cholera in sub-Saharan Africa: a statistical modelling study, for supporting work in this area.
Focus on an area where cholera has been identified as a major issue, and where subnational and sub annual surveillance data is available: Sub-Saharan Africa. Data availability during this time frame will also allow us to take advantage of a number of remotely sensed variables captured over the same time-frame.
data/outbreak_data.csv
)Below are a list of potential indicator datasets for inclusion into the Cholera Lab study based on literature support (Gwenzi & Sanganyado 2019; Lessler et al. 2018; Perez-Saez et al. 2022; Moore et al. 2017, and others outlined below more specifically below)
Based on available Indicators for both spatial and temporal extent of our AOI (Sub-Saharan Africa from 2010-2019) we will extract the following environmental parameters for our investigation.
Variable | Temporal Resolution | Spatial Resolution | Data Availability | Data Source |
---|---|---|---|---|
Land Surface Temperature | monthly | 1.11 km | 1995-2020 | CEDA |
Precipitation | monthly | 5 km | 1981- near present | CHIRPS, with multiple access points, including USCB Storage and SERVIR GLOBAL |
Soil Moisture | daily | 0.25 degrees; approx 27-28 km | 1991-2021 | ESA Climate Data Dashboard |
Environmental factors alone won’t unravel this very complex relationship, but they can help identify spatio-temporal patterns that could help assist in allocating resources and support.
If you are running macOS, consider installing Homebrew, if
not already installed, as there are macOS-specific instructions below that make
use of homebrew
that can simplify the setup process.
This repository contains files larger than 50 MB, and thus requires the use of Git Large File Storage (LFS) for managing them. In order to obtain these large files during repository cloning, you must [install Git Large File Storage].
On macOS, the easiest way to install Git LFS is via Homebrew:
brew install git-lfs
Once installed, initialize it:
git lfs install
To track new types of large files (larger than 50 MB), you must tell Git LFS to track them, typically by extension. For example, to track all Shapefiles:
git lfs track "*.shp"
You can then add and commit such files like any other file in the repository.
Note that the git lfs track
command will modify the .gitattributes
file when
given a new pattern to track. When this occurs, be sure to add .gitattributes
to your commit, along with the newly tracked large files.
[Install Git Large File Storage]: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage
Install conda
. The recommended way to do this is by installing
miniforge:
brew install miniforge
conda init
Then, close your terminal and open a new terminal session.
Once, conda
is installed, run the following commands in your terminal from the
root of this repository to create the environment used for this repository:
conda env create
conda activate geo-ds-cholera
Whenever you modify the environment.yml
file, run the following command to
update your conda environment:
conda env update
If you haven't already done so, create a .env
file at the root of this
repository (ignored by git
), which you can perform by making a copy of
.env-example
, like so:
# This copies .env-example to .env, unless .env already exists
cp -n .env-example .env
Edit your .env
file, setting values as appropriate for yourself, as this file
is not committed to git, and thus is not shared with others because it intended
to contain sensitive, user-specific values. Some parts of the code in this
repository will load values from your .env
file, and thus may either fail to
run or skip certain parts of logic, if your .env
file does not contain
properly configured values.
In order to allow notebooks in this repository to import modules in this
repository, you must perform a local, editable pip
install:
pip install -e .
To aid development, this repository uses the pre-commit
tool, which is
installed into the conda environment created above. To install the pre-commit
hooks defined in .pre-commit-config.yaml
, you must run the following command
from the root of your cloned repository working directory:
pre-commit install --install-hooks
If you wish to run the pre-commit hooks in order to check your changes prior to
committing your changes to git, you can run the following command, but note that
files that are untracked by git will be ignored by the pre-commit hooks.
Therefore, if there are untracked files that you wish to check, you must at
least use git add
to stage them in order for the pre-commit hooks to check
them:
pre-commit run -a
After setting up your local environment (see above), you may reproduce our results as follows:
exploration/zonal-means.ipynb
to reproduce the individual zonal means
CSV files under the data
directory. The inputs to this notebook are the
outbreaks.csv
and shapefile found under the src/cholera/resources
path.exploration/aggregate-zonal-means.ipynb
to reproduce the aggregate
zonal means CSV file under the data
directory. The inputs are the
individual zonal means produced by the previous step.