cmu-delphi / epiprocess

Tools for basic signal processing in epidemiology
https://cmu-delphi.github.io/epiprocess/
Other
13 stars 8 forks source link

Give user more flexibility in outlier detection by changing width of IQR window independently of median width? #420

Open nmdefries opened 8 months ago

nmdefries commented 8 months ago

epiprocess::roll_iqr performs a rolling IQR calculation with the same window width as the median. In some cases, this doesn't filter out m/any values that obviously look like outliers.

Imagine a user is trying to remove outliers from a 7-day rolling average signal where the raw data is not available. Because of the rolling average smoothing, single outliers have been turned into large peaks/troughs that are fairly wide (~7 days), which makes them harder to remove. Making n larger (e.g. 50) would allow those larger peaks to be removed. However, that also makes the calculated rolling median change very slowly over time, which may be unsatisfying in cases where the time series is more dynamic.

Consider including an option for changing the window size used to calculate IQR independently of window size used to calculate median.

nmdefries commented 8 months ago

Example comparing existing function and outlier detection using global IQR (in R)

nmdefries commented 8 months ago

In some cases, a workaround could be to do the outlier removal before doing any smoothing. The outliers should stick out more in the raw data so they'd be easier to detect.

From Jeremy's use case, doing that didn’t work on all states because some had an IQR of 0, due to low counts and/or unusual reporting; this made outlier removal get rid of almost everything. Is this an uncommon use case/is outlier detection inappropriate here?