cmu-delphi / forecast-eval

delphi.cmu.edu/forecast-eval
MIT License
5 stars 2 forks source link

Forecast Eval

The forecast evaluation dashboard provides a robust set of tools and methods for evaluating the performance of epidemic forecasts. The project's goal is to help epidemiological researchers gain insights into the performance of their forecasts and lead to more accurate epidemic forecasting.

Background

This app collects and scores COVID-19 forecasts submitted to the CDC. The dashboard was developed by CMU Delphi in collaboration with the Reich Lab and US COVID-19 Forecast Hub from UMass-Amherst, as part of the Forecast Evaluation Research Collaborative.

The Reich Lab created and maintains the COVID-19 Forecast Hub, a collaborative effort with over 80 groups submitting forecasts to be part of the official CDC COVID-19 ensemble forecast. All Forecase Hub forecasters that are designated "primary" or "secondary" are scored and included in the dashboard.

The Delphi Group created and maintains COVIDcast, a platform for epidemiological surveillance data. COVIDcast provides the ground truth data used to score forecasts against.

The public version of the dashboard runs off of the main branch.

The version on the dev branch appears on the staging website. The username and password are included in the meeting notes doc and on Slack.

The dashboard is backed by the forecast evaluation pipeline. The pipeline runs three times a week, on Sunday, Monday, and Tuesday, using the code on the dev branch. It collects and scores forecasts from the Forecast Hub, and posts the resulting files to a publicly-accessible AWS S3 bucket.

See the "About" writeup for more information about the data and processing steps.

Contributing

main is the production branch and shouldn't be directly modified. Pull requests should be based on and merged into dev. When enough changes have accumulated on dev, a release will be made to sync main with it.

This project requires a recent version of GNU make and docker.

The easiest way to view and develop this project locally is to run the Shiny app from RStudio:

RStudio Screen Shot with Run App button circled

This is the same as running

shiny::runApp("<directory>")

in R. However, dashboard behavior can differ running locally versus running in a container (due to package versioning, packages that haven't been properly added to the container environment, etc), so the dashboard should be also tested in a container.

The dashboard can be run in a Docker container using make. See notes in the Makefile for workarounds if you don't have image repository access.

The pipeline can be run locally with the Report/create_reports.R script or in a container. See notes in the Makefile for workarounds if you don't have image repository access.

Running the scoring pipeline

The scoring pipline use a containerized R environment. See the docker_build directory for more details.

The pipeline can be run locally with the Report/create_reports.R script or in a container via

> make score_forecast

See notes in the Makefile for workarounds if you don't have image repository access.

Running the Shiny app

The dashboard can be run in a Docker container using

> make start_dashboard

See notes in the Makefile for workarounds if you don't have image repository access.

Releasing

main is the production branch and contains the code that the public dashboard uses. Code changes will accumulate on the dev branch and when we want to make a release, dev will be merged into main via the "Create Release" workflow. Version bump type (major, minor, etc) is specified manually when running the action.

If there's some issue with the workflow-based release process, a release can be done manually with:

git checkout dev
git pull origin dev
git checkout -b release_v<major>.<minor>.<patch> origin/dev

Update version number in the DESCRIPTION file and in the dashboard.

git add .
git commit -m "Version <major>.<minor>.<patch> updates"
git tag -a v<major>.<minor>.<patch> -m "Version <major>.<minor>.<patch>"
git push origin release_v<major>.<minor>.<patch>
git push origin v<major>.<minor>.<patch>

Create a PR into main. After the branch is merged to main, perform cleanup by merging main into dev so that dev stays up to date.

Dependencies

The scoring pipeline runs in a docker container built from docker_build/Dockerfile, which is a straight copy of the covidcast-docker image. The dashboard runs in a docker container built from devops/Dockerfile.

When updates are made in the evalcast package the behavior of the scoring script can be affected and the covidcast docker image must be rebuilt. The workflow in the covidcast-docker repository that does this needs to be triggered manually. Before building the new image, ensure that the changes in evalcast will be compatible with the scoring pipeline.

Currently, the scoring pipeline uses the the evalcast package from theevalcast branch of the covidcast repository. However, if we need to make forecast eval-specific changes to the evalcast package that would conflict with other use cases, we have in the past created a dedicated forecast-eval branch of evalcast.

Performing a manual rollback

For the dashboard

This should only be performed if absolutely necessary.

  1. Change this forecasteval line to point to the desired (most recently working) sha256 hash rather than the latest tag. The hashes can be found in the Delphi ghcr.io image repository -- these require special permissions to view. Ask Brian for permissions, ask Nat for hash info.
  2. Create a PR into main. Tag Brian as reviewer and let him know over Slack. Changes will automatically propagate to production once merged.
  3. When creating the next normal release, code changes will no longer automatically propagate via the latest image to the public dashboard; the tag in the ansible settings file must be manually changed back to latest.

For the pipeline

  1. Change the FROM line in the docker_build Dockerfile to point to the most recently working sha256 hash rather than the latest tag. The hashes can be found in the Delphi ghcr.io image repository -- these require special permissions to view. Ask Brian for permissions, ask Nat for hash info.
  2. Create a PR into dev. Tag Katie or Nat as reviewer and let them know over Slack. Changes will automatically propagate to production once merged.
  3. When building the next covidcast docker image, changes will no longer automatically propagate via the latest covidcast image to the local pipeline image; the tag in docker_build/Dockerfile must be manually changed back to latest.

Code Structure