SOLV-Code / Open-Source-Env-Cov-PacSalmon

Open-source, Human-centered Data Management for Environmental Covariates in Pacific Salmon Models
GNU General Public License v3.0
0 stars 0 forks source link

An Open-Source Data System for Pacific Salmon Environmental Covariates (PSEC)

This repo is taking shape based on discussions at the 2023 PICES Annual Meeting and the 2024 World Fisheries Congress. Check back frequently and leave some feedback. All components are up for debate at this stage, so join in early to shape the project!

Poster from WFC 2024 in Seattle and an overview presentation of the approach from PICES 2023 in Seattle are available here.

This repository currently covers 7 data sets of environmental covariates:

See the summary of current coverage for an inventory of environmental covariates extracted from those sources so far.

See the comparison of PDO covariates for an example of a closer look at some of the data, set up for easy collaborative editing.

See the scope of current data sets for a high-level comparison of the source data sets.

Suggestions for additional data sets can be added on the To Do List.

Important Warning: Environmental covariates in this repository are mapped onto the year of measurement. Before use, they need to be lined up properly for each specific salmon spawner and recruit data set based on the assumed mechanism of interaction and the life history of the stock. This also means that direct comparisons between diverse environmental indicators need to be approached with caution (e.g., pair-wise correlations).

Purpose

We've used a lightweight, human-centered data management system to compile and manage source data for Yukon River Chinook Salmon run reconstruction (Pestal et al. 2022). 33 people from 11 organizations contributed to the data compilation and review process, consolidating quality-controlled records and detailed meta-data for 41 individual assessment projects covering a basin of 850,000 km2. Data summaries and model input files had to be constantly updated as data review and model development progressed concurrently. A worked example of the approach is available in a github repository. The worked example explains the structure of the underlying data management system and how the components fit together. It includes tips on getting started with git/github and for setting up automated reports using markdown, plus wiki pages with background information on human-centered design and excerpts from interesting papers.

We are now trying to test whether this approach can be scaled up from a relatively contained setting to a more typical data management situation. The Yukon Chinook data compilation was part of high-priority project with clear terms of reference for the project, a large technical working group, and dedicated resources. More typically, data management systems are not implemented for just a single specific analysis or project, but developed and maintained as a more general-use resource (e.g., a data base of all the regional salmon spawner estimates). This changes the human dynamics of contribution, from a highly-focused and time-constrained deliverable for targeted outcome, to a long-running commitment, often without any clear and direct result for the individual contributors.

In this type of setting, a data management system that minimizes procedural and technical hassles has the potential to greatly improve the data resource.

An open-source data resource is a good test for this idea, because individual contributors are not obligated to work through a steep learning curve or drop other tasks because a senior manager made a request.

Scope

Data management systems are easier to design and maintain if their scope is clearly bounded. For this project, we are currently working with the following bounds:

For now, we are excluding biological covariates, such as copepod diversity, winter ichthyoplankton biomass, or catch/abundance of other salmon species (e.g., using pink salmon abundance as a covariate in sockeye salmon models). In cases where biological variables are part of a multi-variable index, such as the NOAA Ocean Conditions Index, we include the overall index, and the individual environmental components of the index.

Along the way, we are also compiling an inventory of any interesting sources of environmental information that come up, but are out of scope for this project. These are stored on a wiki page.

Repository structure

Get Started

You have four options for browsing through this repository:

Feedback on PSEC

If you have any questions, comments, or ideas for extensions, you can leave a note on the issues page by clicking on New Issue. Make sure to give it an informative title.

You can also scroll through any other open issues to follow the discussion and contribute ideas.