@ceilingloft @nmarchio @stevenbuschbach @manasip5 - here's all the analysis I mentioned on Friday! no need to read all of this - just more of an FYI and a vignette explaining the logic 😄

Implementing notch filtering and comparing causal vs non-causal filtering

Because of reporting delays and bunching, we know there is (at least) weekly periodicity to the reported case counts in India. Accounting for this behavior is possible in the frequency domain. Additionally, to do purely causal estimation, our smoothing window should only take into account points at times less than or equal to the current time when doing convolution. However, a purely causal filter introduces a lag in the time series.

Periodicity

Taking the Fourier transform of the reported case time series in India shows that there are slightly stronger frequency components around 1 week and half-week periods (as indicated by the dashed lines below). The shaded region also shows the 95% confidence interval for the estimate of each Fourier coefficient. Ideally, to use a notch filter, we'd want the lower bound of the confidence interval of the frequencies we're notching out to be above the upper bounds of other frequencies, but we'll overlook that for now. periodogram_CI

Also, note the much stronger peak around v = 1/124 - this is due to what time series folks call non-stationarity (briefly, the mean of the case time series changes over time, and the correlation between case counts at times s and t is not a simple function of the difference |s - t|). There are ways to deal with non-stationarity, but the standard ones obscure too much of the high-frequency components we're analyzing when we look for anomalies. We can proceed with spectral analysis, but we should note that the non-stationarity complicates the interpretation of the Fourier transform.

Filter design

A notch filter is a simple technique to suppress parts of a signal's spectrum around a specific frequency (think of it as setting the Fourier coefficients in the neighborhood of the unwanted frequency to zero). The classic use case for a notch filter is filtering out oscillations from a wall outlet in electronic devices that have been plugged in. The non-stationarity and imperfect peaks can be managed somewhat by setting the quality factor - by how much a Fourier coefficient's magnitude is reduced around our notch frequencies. We'll likely need to play around with the exact quality factor for other countries' case data but the values I found seem to work for India (for now).

Below is a diagram called a Bode plot showing the frequency content of the filter that notches out the 1/week and 2/week frequencies. Since the 1/week frequency peak is stronger, the notch filter damps the spectrum around that peak more. double_notch_bode

This is the result of applying the filter we've designed - you can see the weekly spikes in the black rectangle (added for emphasis) have been removed (yay!) notch_filter

Causal Filtering

Now that we have the periodicity notched out, we can remove some of the remaining jaggedness with a uniform window smoothing function. In theory, we need to be careful to ensure that the output result at time t0 only depends on input values from time less than or equal to t0, because in the framework of real-time estimation, we can't know infection counts in the future. This unfortunately introduces a lag in the output time series: causal_smoothing

This lag issue is also why the notch filter is applied using filtfilt rather than lfilter; more details available on this SO post.

Setting the proper convolution mode fixes this, but the zero-padding at the end means our output signal is unusable for the last N points when using a convolution window of length N: non-causal-smoothing

A simple way to deal with this is to time reverse the signal at the end and use those values as padding (very similar to what @lmbett does in his code here, but with different weights depending on the size of the window.) The results require some tuning of the window: non_causal_padding

Caveats

Note that this PR does not address convolution needed to deal with right-censoring, onset-time, and contemporary reporting delays.

API design

I just put a single function in the adaptive.smoothing package with hard-coded notch frequencies, but what I'd like to do at some point is disentangle the notching from the smoothing and provide a way to compose smoothing functions, like:

def notch(notch_freq: Sequence[float], quality_factors: Optional[Sequence[float]]) -> Callable[Sequence[float], Sequence[float]]:
    ...

def convolution(window: int, kernel: str) -> Callable[Sequence[float], Sequence[float]]:
    ...

def compose(smoothers: Sequence[Callable[Sequence[float], Sequence[float]]]) -> Callable[Sequence[float], Sequence[float]]:
    ...

COVID-IWG / epimargin

Implement notch-filtered smoothing #72