Closed lmossina closed 1 year ago
Remark: for small values of alpha, outside the admissible range, the authors set the quantile to infinity. We should consider this and see what we do in our code. An infinite prediction interval can be useless in practice, especially for a user not acquainted with conformal prediction.
We can systematically add np.inf
to the array of lower and upper predictions in deel.puncc.api.calibration
:
concat_y_lo = np.concatenate((concat_y_lo, [np.inf]))
concat_y_hi = np.concatenate((concat_y_hi, [np.inf]))
y_lo = (-1) * np.quantile (-1) * concat_y_lo, 1 - alpha, axis=1, method="inverted_cdf")
y_hi = np.quantile( concat_y_hi, 1 - alpha, axis=1, method="inverted_cdf")
Do you see any downside to this ?
timeit
, for different sizes of array, the concatenation seems to be 5-15% slower, w.r.t. to simply computing the correction (1-alpha)*(1 + 1/len(concat_))
My proposal comes directly from Lemma 1 of Tibshirani's paper (https://arxiv.org/pdf/1904.06019.pdf). The coverage guarantee holds with 1) the inflated $(1-\alpha)\cdot(1+1/n)$-th quantile or 2) when adding an infinite term to the sequence and computing the $(1-\alpha)$-th empirical quantile.
The upside of the second method is producing infinitly large prediction intervals if $\alpha$ is too low.
If you prefer that version, let's go with that. (the code should not return infinite pred intervals, if the checks on alpha are in place)
TODO The old branch
https://github.com/deel-ai/puncc/tree/fix-cv-quantile
, not merged into main, contains some corrections to the quantile procedure followed during cross-validation-plus. These must be reimplemented in the current main. It is not worth the time to merge this old code, just re-write and re-check everything for statistical correctness.Where:
deel/puncc/api/calibration.py
The imprecise code:
The
1 - alpha
should be(1 - alpha)(1 + 1/n)
, where n is the number of training points (only in the case of jackknife+ and CV+!)Sources:
q+
andq-
formulae.Remark: for small values of alpha, outside the admissible range, the authors set the quantile to infinity. We should consider this and see what we do in our code. An infinite prediction interval can be useless in practice, especially for a user not acquainted with conformal prediction.