In early April 2020, Delphi developed a uniform data schema for a new Epidata endpoint focused on COVID-19. Our intent was to provide signals that would track in real-time and in fine geographic granularity all facets of the COVID-19 pandemic, aiding both nowcasting and forecasting. Delphi's long history in tracking and forecasting influenza made us uniquely situated to provide access to data streams not available anywhere else, including medical claims data, electronic medical records, lab test records, massive public surveys, and internet search trends. We also process commonly-used publicly-available data sources, both for user convenience and to provide data versioning for sources that do not track revisions themselves.
Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call COVID-19 indicators---in a common format for upload to the COVIDcast API.
For client access to the API, along with a variety of other utilities, see our R and Python packages.
For interactive visualizations (of a subset of the available indicators), see our COVIDcast map.
Utilities:
_delphi_utils_python
- common behaviors_template_python
& _template_r
- starting points for new data sourcesansible
& jenkins
- automated testing and deploymentsir_complainsalot
- a Slack bot to check for missing dataIndicator pipelines: all remaining directories.
Each indicator pipeline includes its own documentation.
prod
reflects what is currently in production. main
is the staging branch for the next release.
main
to develop a new changemain
and assign a reviewer (or tag someone) to get feedback on your change. List the issue number under Fixes
if your change resolves an existing GitHub Issue.Each indicator has a make lint
command to check for linting errors and a make format
command to incrementally format your code (using
darker). These are both automated with a
Github Action.
If you get the error ERROR:darker.git:fatal: Not a valid commit name <hash>
,
then it's likely because your local main branch is not up to date; either you
need to rebase or merge. Note that darker
reads from pyproject.toml
for
default settings.
If the lines you change are in a file that uses 2 space indentation, darker
will indent the lines around your changes and not the rest, which will likely
break the code; in that case, you should probably just pass the whole file
through black. You can do that with the following command (using the same
virtual environment as above):
env/bin/black <file>
The release process consists of multiple steps which can all be done via the GitHub website:
Run workflow
dropdown button. Leave branch as main
unless you know what you're doing. Enter the type of release (patch: bugfixes, params file changes, new signals for existing indicators; minor: new indicators, new utilities; major: backwards-incompatible changes requiring substantial refactoring) and GitHub will automatically compute the next version number for you; alternately, specify the version number by hand. Hit the green Run workflow
button.#xxx
notation and GitHub will automatically render the title of each PR in Preview mode and when the edit is saved.main
branchdelphi-utils
was updated) Upload the new version of delphi-utils
to PyPIYou may need to be an admin to perform some of the steps above.
This repository is released under the MIT License.