Open nolangormley opened 2 months ago
I believe this was part of @rumackaaron 's work. Are we correct in assuming that these should match?
Interesting find! Mathematically, they don't have to match and I think that's the expected behavior in this case. When creating the design matrix in weekday.py, the constraint is that $\sum{wd=0}^6 \alpha{wd} = 1$. After fitting the day-of-week parameters $\alpha$, we take the original signal $yt$ and multiply it by $\exp(\alpha{wd})$ to get the weekday-adjusted signal $y'_t$ (where $wd$ is the day-of-week of $t$).
For simplicity, say that there are only two days in the week. Let $\alpha_0 = -1$ and $\alpha_1 = 1$, and $y_0 = 5$ and $y_1$ = 1. The sum of the raw values $y$ is 6, and the sum of the weekday-adjusted values is $5\exp(-1) + \exp(1) = 4.55$. We see something similar here, where the sum of the adjusted signal is lower than the sum of the raw signal.
It may be possible to create a different constraint to ensure that (at least on the training data), the sum of the original signal is the same as that of the adjusted signal. I don't think it's possible to ensure that constraint holds over an arbitrary time interval while using multiplicative day-of-week effects.
P.S. I find it concerning that the "sawtooth" pattern is still present in the adjusted signal. I don't know what the training period is for fitting the day-of-week effects, but it may be worth experimenting to find an appropriate period that consistently removes the "sawtooth" pattern.
I don't think it's possible to ensure that constraint holds over an arbitrary time interval while using multiplicative day-of-week effects.
Indeed. In fact, it's not possible to ensure that with any modification (think of the special case of an interval of one day).
Even if we relax the requirement to all intervals of some fixed length (e.g. 7 days), I think that the only solution is a moving average. But a moving average isn't sufficiently sensitive to the most recent developments.
This suggests an asymmetric kernel, e.g. a triangle or half-Gaussian. I think all kernels satisfy some form of long-term AUC equivalence. But this doesn't address the day-of-week effects.
We need to send this problem for some research TLC.
Actual Behavior:
When looking at the data from the Doctor Visits signal, the day-adjusted signal does not seem to match the area under the curve of the raw signal. The sum of the values on the raw signal is 67.70 and the day-adjusted signal is 56.22.
Expected behavior
@RoniRos and I were looking through this yesterday and it was our intuition that the AUC should match between these two signals.
Context
Here's some code to replicate the plot above