Closed orisenbazuru closed 8 months ago
This is great! Thanks so much for doing all this work. Let me try and digest it a little bit before commenting.
The point of naive
is to be totally naive --- as if we ran conformal assuming there was no distribution shift.
Sure no worries - in case it is helpful, I uploaded the adaptation to your notebook documenting the experiments I referenced above https://gist.github.com/orisenbazuru/72236a74083e48db06daf838b5def0e6
Beautiful, thanks :)
First many thanks for this awesome repo and the tutorial on split conformal prediction. While going through the
conformal prediction under distribution drift
section and the corresponding exampleweather prediction with time-series distribution shift
, I noticed that thenaive
implementation for determining $\hat{q}$ usesexpanding window
that takes all scores up to time $t$ and compute the quantile and iterates over $t$But one can think of another approach by using a
rolling window
of fixed window size $K$ (in the example you were using $K=1000$) and then compute the quantile on each of the windows -- I rewrote and tested the function to support both options belowPlot of the $\hat{q}_{expandingwindow}$
expanding
window approach (i.e.naive
implementation)vs. the plot for e $\hat{q}_{rollingwindow}$
rolling
window approachIf we compare the results to the
weighted
conformal prediction approach it is very similar to therolling
window but with the cost of additional computation for findinginfimum of q
$$\hat{q} = inf \{ q : \sum^{n}_{i=1} \tilde{w}_i \mathbb{1} \{s_i \leq q\}\geq 1- \alpha \}$$that requires finding the roots of the expression above after moving the $1-\alpha$ to the left side of the inequality (i am using the generalized expression but in practice we are using the adaptation to window based version in section
5.3
).Lastly here is the comparison of coverage over time of the three approaches
In fact when computing the overall coverage, the
rolling
window version achieves the best coverage with score0.900665
vs.0.8995545
for theweighted
version.It might be the case that for this data adding the constraint of finding the infimum does not add much, but overall if the argument of weighted conformal prediction is based on weighting the recent observation in a window then a
sensibly defined
window would be sufficient to counter the drift especially that we are not "learning the weights $w$" but rather fixing them to a uniform across the window in both cases (unless I am missing something here :) )On a separate note, one minor issue in the code is the size of the window K. The way it is coded now it is translated to be K+1 observations used, and in the case of weighted conformal prediction, when computing
qhats
you omit the first observation from the computation.Thank you again for your work and effort to make conformal prediction accessible to the masses.