cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Validate that provided geo_ids match expectations #424

Open JedGrabman opened 3 years ago

JedGrabman commented 3 years ago

Tracking bug for item in plans.md: Which, if any, specific geo_ids are missing (get unique geo ids from historical data or delphi_utils)

Currently, we only check that geo_ids are in the correct format. There are 2 categories of issues that should be addressed:

  1. If the data contains geo_ids that do not match any known values
  2. If the data is missing known geo_ids

The first case should probably be an error, since it suggests a typo or otherwise bad data is being ingested. The second case should be a warning, since missing data can legitimately occur if it is not reported to our upstream data sources.

JedGrabman commented 3 years ago

I am currently working on this on a forked branch.

Can somebody add me as assignee? it looks like I don't have permission in this repo.

JedGrabman commented 3 years ago

PR #470 addresses category 1 (alert on new geo_ids).

@nmdefries Creating an alert for category 2 (missing geo_ids) may be too noisy. We would potentially need a different list for every data source / signal combination and some of those are much less consistent. For example, the number of counties we get survey data from each day fluctuates. I'd suggest deprioritizing this (unless this is a frequent cause of issues).

nmdefries commented 3 years ago

That make sense. It may be that this would only be useful at higher geo-levels, like states, where we can reasonably expect data to be available every day. The only relevant bug I'm aware of is #179.