cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Add ability to tune validation outlier thresholds #970

Open sgsmob opened 3 years ago

sgsmob commented 3 years ago

One validation check is to compare the actual signal value and a "predicted" signal value from the recent historical API data. If these values are too far apart, we classify the signal as anomalous and trigger a validation error. Currently this is done using a set of hardcoded thresholds, which gives the user two options: use these thresholds or turn off this type of validation check entirely. It would nice if we had a method of adjusting these thresholds from the params.json file to allow indicators with more variance to still have this type of validation.

sgsmob commented 3 years ago

It is not obvious how we should do this, but here are some ideas.

  1. Expose the thresholds to the user
    1. (+) Most customizable
    2. (-) Hard for the user to meaningfully set since they have no inherent interpretation (?)
  2. Expose a multiplicative factor to apply to the thresholds
    1. (+) Only need single parameter
    2. (-) Potentially even less interpretable than option 1
  3. Expose discreet sensitivity levels to the user ("high", "medium", "low")
    1. (+) Obvious interpretation
    2. (-) How to we set the levels meaningfully?
    3. (-) Extending to higher granularity could be messy

@nmdefries any additional thoughts?

nmdefries commented 3 years ago

Options 3 sounds the most reasonable. If I recall correctly, the metric that is being used (relative mean difference) asymptotes to 2, so a user setting a multiplicative factor or a threshold directly could easily create a threshold that never triggers an error.

Hard for the user to meaningfully set since they have no inherent interpretation (?)

Strong agree here. The threshold values were never set to mean anything in particular, they were eyeballed to keep the false positive rate reasonable (only based on the Facebook survey pipeline, I believe). Relative mean difference is a measure of dispersion so maybe thresholds can be roughly mapped to percentiles (and back) for better intuition? The current metric is only relative mean difference-esque and does not account for variance. But if we're open to changing how this check works, it'd be a lot more interpretable if we used percentile thresholds directly.

allow indicators with more variance to still have this type of validation.

However, given that relative mean difference is a measure of dispersion, indicators with more or less variance should be equally able to use the hardcoded thresholds, I think.

krivard commented 3 years ago

The exemplar for this feature is JHU county and msa signals, which generate this validation failure on all reference dates; more context in this PR thread. @nmdefries would you check it out?

nmdefries commented 3 years ago

It appears that the overzealous errors are because the existing metric doesn't take variance into account. This approach works fine for relatively smooth signals, but in regions with small populations (e.g. small counties) and rare signals (e.g. deaths) variance is high causing many errors to be raised.

Instead of allowing users to change the hardcoded thresholds for each signal which, as discussed above, is challenging, I'm addressing this in #1051 by switching to z-score. This takes historical variance into account and performs more uniformly across geo regions and signals.

Additionally, the default reference window of 7 days causes problems because it is the same width as the standard moving average smoothing. In smoothed rare signals (e.g. deaths), a 7-day reference period will often contain no value changes such that either the previous "relative mean difference"-esque metric or z-score would raise an error at even a small change. Expanding the reference window width for smoothed signals resolves this.