Improved baseline computation for conformal prediction under distribution shift

orisenbazuru commented 1 year ago

First many thanks for this awesome repo and the tutorial on split conformal prediction. While going through the conformal prediction under distribution drift section and the corresponding example weather prediction with time-series distribution shift, I noticed that the naive implementation for determining $\hat{q}$ uses expanding window that takes all scores up to time $t$ and compute the quantile and iterates over $t$

naive_qhats = np.array( [np.quantile(scores[:t], np.ceil((t+1)*(1-alpha))/t, interpolation='higher') for t in range(K+1, scores.shape[0]) ] )

But one can think of another approach by using a rolling window of fixed window size $K$ (in the example you were using $K=1000$) and then compute the quantile on each of the windows -- I rewrote and tested the function to support both options below

def compute_qhats(scores, alpha, wsize, opt='fixed_window'):
    lst = []
    qlst = []
    K = wsize
    for t in range(K+1, scores.shape[0]):
        if opt == 'fixed_window':
            start = t - K - 1
            nsamples = K + 1 # = t - start = t - (t - wsize -1) = wsize + 1 = K + 1    QED
        elif opt == 'expanding_window':
            start = 0
            nsamples = t
        q_level = np.ceil((nsamples+1)*(1-alpha))/nsamples
        qlst.append(q_level)
        tmp = np.quantile(scores[start:t], q_level, interpolation='higher')
        lst.append(tmp)
        print('start:', start, 'end:', t)
    return np.array(qlst), np.array(lst)

Plot of the $\hat{q}_{expandingwindow}$ expanding window approach (i.e. naive implementation) Screenshot 2022-11-28 at 16 35 04

vs. the plot for e $\hat{q}_{rollingwindow}$ rolling window approach Screenshot 2022-11-28 at 16 37 49

If we compare the results to the weighted conformal prediction approach Screenshot 2022-11-28 at 16 45 05 it is very similar to the rolling window but with the cost of additional computation for finding infimum of q $$\hat{q} = inf \{ q : \sum^{n}_{i=1} \tilde{w}_i \mathbb{1} \{s_i \leq q\}\geq 1- \alpha \}$$

that requires finding the roots of the expression above after moving the $1-\alpha$ to the left side of the inequality (i am using the generalized expression but in practice we are using the adaptation to window based version in section 5.3).

Lastly here is the comparison of coverage over time of the three approaches Screenshot 2022-11-28 at 17 08 23

In fact when computing the overall coverage, the rolling window version achieves the best coverage with score 0.900665 vs. 0.8995545 for the weighted version.

It might be the case that for this data adding the constraint of finding the infimum does not add much, but overall if the argument of weighted conformal prediction is based on weighting the recent observation in a window then a sensibly defined window would be sufficient to counter the drift especially that we are not "learning the weights $w$" but rather fixing them to a uniform across the window in both cases (unless I am missing something here :) )

On a separate note, one minor issue in the code is the size of the window K. The way it is coded now it is translated to be K+1 observations used, and in the case of weighted conformal prediction, when computing qhats you omit the first observation from the computation.

Thank you again for your work and effort to make conformal prediction accessible to the masses.

aangelopoulos commented 1 year ago

This is great! Thanks so much for doing all this work. Let me try and digest it a little bit before commenting.

The point of naive is to be totally naive --- as if we ran conformal assuming there was no distribution shift.

orisenbazuru commented 1 year ago

Sure no worries - in case it is helpful, I uploaded the adaptation to your notebook documenting the experiments I referenced above https://gist.github.com/orisenbazuru/72236a74083e48db06daf838b5def0e6

aangelopoulos commented 1 year ago

Beautiful, thanks :)

aangelopoulos / conformal-prediction

Improved baseline computation for conformal prediction under distribution shift #3