Despite the proliferation of computer-based research on hydrology and water resources, such research is typically poorly reproducible. Published studies have low reproducibility because of both incomplete availability of the digital artifacts of research and a lack of documentation on workflow processes. This leads to a lack of transparency and efficiency because existing code can neither be checked nor re-used. Given the high-level commonalities between existing process-based hydrological models in terms of their input data and required pre-processing steps, more open sharing of code can lead to large efficiency gains for the modeling community.
Here we present a model configuration workflow that provides full reproducibility of the resulting model instantiation in a way that separates the model-agnostic preprocessing of specific datasets from the model-specific requirements that specific models impose on their input files. This workflow is applied to the Structure for Unifying Multiple Modeling Alternatives (SUMMA, Clark et al., 2015a,b) and mizuRoute (Mizukami et al., 2016), to create a model configuration that provides process-based hydrologic simulations and vector-based streamflow routing capabilities. The workflow uses open-source data with global coverage to determine model parameters and forcing, thus enabling transparent and efficient hydrologic science.
The code in this repository is the outcome of stepwise modification of an existing SUMMA instantiation developed by Andy Wood and colleagues at NCAR. This existing setup served as a testbed for consecutive changes to model input data, domain discretization and domain size, resulting in globally applicable model setup code that separates model-agnostic and model-specific configuration tasks.
Use of this workflow requires accounts with various data providers. Login details with these providers are stored as plain text in the user's home directory. It is therefore strongly recommended that you choose unique, new passwords for these accounts. Using passwords the same passwords you use elsewhere poses a security risk.
A basic SUMMA + mizuRoute setup requires:
This workflow requires the user to provide the catchment and river network shapefiles with certain required contents (see the relevant readme's for details). The scripts in the repository provide all the necessary code to download and pre-process forcing and parameter data, create SUMMA's and mizuRoute's required input files, and run hydrologic and routing simulations. This generates a basic SUMMA + mizuRoute setup upon which the user can improve by, for example, swapping global datasets for higher quality local ones or connecting the model setup to a calibration algorithm.
The workflow uses the following data sources:
The workflow can thus generate model setups with global coverage and for the past half century.
The workflow assumes the user can provide shapefiles that delineate the (sub-)catchments used by SUMMA and the river network used by mizuRoute. These shapefiles should include certain mandatory elements. The folder 0_example
contains example shapefiles that can be used to create a model setup for the Bow at Banff, Canada. This folder also contains a detailed description of shapefile requirements.
The workflow is organized around the idea that the code that generates data (i.e. the scripts that form this repo) is kept in a separate directory from the data that is downloaded and created. The connection between repository scripts and data directory is given in the control_file
as control setting root_path
. We strongly recommend to not put the data directory specified in root_path
inside any of the repository folders, but to use a dedicated and separate location for the data instead. Note that the size requirement of the data directory depends on the size of the domain and the length and number of simulations (see below).
A typical application would look as follows:
summaWorkflow_public/0_control_files
. Copy and rename control_BowAtBanff.txt
to something more descriptive of your modeling domain.root_path
;summaWorkflow_public/1_folderPrep
and run the notebook or Python code there to create the basic layout of your data directory..shp
) into the newly created your/data/path/domain_[yourDomain]/shapefiles
folder, placing the shapefiles in the catchment
and river_network
folders respectively.To assist in understanding the process described above, example shapefiles and a control file for the Bow river at Banff, AB, Canada, are provided as part of this repository. Shapefiles can be found in the folder 0_example
. The control file can be found in 0_control_files
. We strongly recommend to first use the provided shapefiles and control file to create your own setup for the Bow river at Banff. This domain is relatively small and the control file only specifies 1 year of data, which limits the download requirements. Instructions:
root_path
in the file control_BowAtBanff.txt
to point to your desired data directory location;./1_folder_prep
. This creates a basic folder structure in your specified data directory../0_examples/shapefiles
folder in this repo into the newly generated basic folder structure in your data directory. The remaining scripts in the workflow will look for the shapefiles there.The workflow uses a combination of Python and Bash. This section lists how to setup your system to use this workflow. We recommend you contact your system administrator if none of this makes sense. Note that this section is a work in progress.
The Python code requires various packages, which may be installed through either pip
or conda
. It is typically good practice to create a clean (virtual) environment and install the required packages through a package manager. The workflow was developed on Python 3.7.7. and successfully tested on Python 3.8.8.
Pip:
Package requirements specified in requirements.txt
. Assumes a local install of the GDAL
library is available. Scripts for topographic analysis are set up to interact with a stand-alone install of QGIS (see below). Basic instructions to create a new virtual environment:
cd /path/to/summaWorkflow_public
virtualenv summa-env
source summa-env/bin/activate
pip install -r requirements.txt
Conda:
Package requirements specified in environment.yml
. Installs GDAL
as a Conda package. Scripts for topographic analysis are set up to use the Conda QGIS
package (see below). Basic instructions to create a new virtual environment:
cd /path/to/summaWorkflow_public
conda env create -f environment.yml
conda activate summa-env
If summa-env
is not automatically added as a kernel, close the notebook, run the following from a conda terminal and restart the notebook:
python -m ipykernel install --name summa-env
Please note that while conda automatically installs the necessary underlying libraries for a given package, pip does not. The user must take care to have local installs of the required libraries if using pip. Assumed to exist locally are:
module load proj/7.0.1
module load geos/3.8.1
module load gdal/3.0.4
module load libspatialindex/1.8.5
The scripts used for geospatial analysis use several functions from QGIS. Depending on your system, you may be able to get QGIS
as a Conda package (https://anaconda.org/conda-forge/qgis) or require a stand-alone install of QGIS (https://qgis.org/en/site/). The provided notebooks in folder /summaWorkflow_public/4b_remapping/1_topo/
are designed to use QGIS
as a Conda package; the Python scripts in this folder show how to use a standalone install. This folder also contains a more detailed description of QGIS setup.
The Bash code requires various libraries and command line utilities. These are (tested versions in brackets):
GCC (7.3.0)
compiler: https://gcc.gnu.org/openblas (0.3.4)
library: https://www.openblas.net/netcdf-fortran (4.4.4)
library: https://www.unidata.ucar.edu/software/netcdf/fortran/docs/gdal (2.1.3)
: https://gdal.org/GNU Parallel (20180122)
: https://www.gnu.org/software/parallel/netCDF Operators (4.9.5)
: http://nco.sourceforge.net/At the time of writing (12-04-2021) numpy
issues warnings about a deprecated feature. netCDF4
uses this feature and as a result any script that uses netCDF4
currently floods the screen with warnings. These are safe to ignore. See:
Disk space requirements are largely dependent on the size of the modeling domain (in time and space) and the number of output variables saved by SUMMA. Minimum requirements for the Bow at Banff example are as follows:
Note that data generated in intermediate steps in the workflow is saved in corresponding directories. Users may wish to manually delete these intermediate results if disk space is an issue.
This workflow (“the program”) is licensed under the GNU GPL v3.0 license. You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/. Please take note of the following: This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. In practical terms, this means that:
Our thanks to those who have contributed to improving this repository (in order of first reports):
Benham, E., Ahrens, R. J., & Nettleton, W. D. (2009). Clarification of Soil Texture Class Boundaries. United States Department of Agriculture. https://www.nrcs.usda.gov/wps/portal/nrcs/detail/ks/soils/?cid=nrcs142p2_033171
Clark, M. P., B. Nijssen, J. D. Lundquist, D. Kavetski, D. E. Rupp, R. A. Woods, J. E. Freer, E. D. Gutmann, A. W. Wood, L. D. Brekke, J. R. Arnold, D. J. Gochis, R. M. Rasmussen, 2015a: A unified approach for process-based hydrologic modeling: Part 1. Modeling concept. Water Resources Research, doi:10.1002/2015WR017198
Clark, M. P., B. Nijssen, J. D. Lundquist, D. Kavetski, D. E. Rupp, R. A. Woods, J. E. Freer, E. D. Gutmann, A. W. Wood, D. J. Gochis, R. M. Rasmussen, D. G. Tarboton, V. Mahat, G. N. Flerchinger, D. G. Marks, 2015b: A unified approach for process-based hydrologic modeling: Part 2. Model implementation and case studies. Water Resources Research, doi:10.1002/2015WR017200
Clark, M. P., B. Nijssen, J. D. Lundquist, D. Kavetski, D. E. Rupp, R. A. Woods, J. E. Freer, E. D. Gutmann, A. W. Wood, L. D. Brekke, J. R. Arnold, D. J. Gochis, R. M. Rasmussen, D. G. Tarboton, V. Mahat, G. N. Flerchinger, D. G. Marks, 2015c: The structure for unifying multiple modeling alternatives (SUMMA), Version 1.0: Technical Description. NCAR Technical Note NCAR/TN-514+STR, 50 pp., doi:10.5065/D6WQ01TD
Copernicus Climate Change Service (C3S) (2017): ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate. Copernicus Climate Change Service Climate Data Store (CDS), 2020-03-26. https://cds.climate.copernicus.eu/cdsapp#!/home
Friedl, M., Sulla-Menashe, D. (2019). MCD12Q1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V006 [Data set]. NASA EOSDIS Land Processes DAAC. Accessed 2020-05-20 from https://doi.org/10.5067/MODIS/MCD12Q1.006
Hengl T, Mendes de Jesus J, Heuvelink GBM, Ruiperez Gonzalez M, Kilibarda M, Blagotić A, et al. (2017) SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE 12(2): e0169748. https://doi.org/10.1371/journal.pone.0169748
Knoben, W. J. M. (2021). Global USDA-NRCS soil texture class map, HydroShare, https://doi.org/10.4211/hs.1361509511e44adfba814f6950c6e742
Mizukami, N., Clark, M. P., Sampson, K., Nijssen, B., Mao, Y., McMillan, H., Viger, R. J., Markstrom, S. L., Hay, L. E., Woods, R., Arnold, J. R., and Brekke, L. D., 2016: mizuRoute version 1: a river network routing tool for a continental domain water resources applications, Geosci. Model Dev., 9, 2223–2238, https://doi.org/10.5194/gmd-9-2223-2016
Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P.D., Allen, G.H., Pavelsky, T.M., 2019. MERIT Hydro: A High‐Resolution Global Hydrography Map Based on Latest Topography Dataset. Water Resour. Res. 55, 5053–5073. https://doi.org/10.1029/2019WR024873