Open nmdefries opened 8 months ago
In some cases, a workaround could be to do the outlier removal before doing any smoothing. The outliers should stick out more in the raw data so they'd be easier to detect.
From Jeremy's use case, doing that didn’t work on all states because some had an IQR of 0, due to low counts and/or unusual reporting; this made outlier removal get rid of almost everything. Is this an uncommon use case/is outlier detection inappropriate here?
epiprocess::roll_iqr
performs a rolling IQR calculation with the same window width as the median. In some cases, this doesn't filter out m/any values that obviously look like outliers.Imagine a user is trying to remove outliers from a 7-day rolling average signal where the raw data is not available. Because of the rolling average smoothing, single outliers have been turned into large peaks/troughs that are fairly wide (~7 days), which makes them harder to remove. Making
n
larger (e.g. 50) would allow those larger peaks to be removed. However, that also makes the calculated rolling median change very slowly over time, which may be unsatisfying in cases where the time series is more dynamic.Consider including an option for changing the window size used to calculate IQR independently of window size used to calculate median.