non-conformity measures appear to not be calculated out-of-bag

I just noticed your new update that implements conformal prediction intervals. While I don't see any bugs it appears the empirical distribution of the nonconformity scores (i.e. the abs(residuals(fit)) are not calculated on a holdout calibration set. In the source you provided in your documentation (i.e. Boström, H., 2022) they use out-of-bag predictions for calibration, but you seem to be using the one-step-ahead error for h=1 within the same training set. First of all, how does this implementation not violate the exchangeability assumptions that conformal prediction is based on? Second, shouldn't the empirical distribution of nonconformity scores for say h=10 be different (probably wider) than for h=1?

It seems that to correct this you would have to randomly split the timeslices into proper training and calibration sets (I know they can overlap but it unclear to me how to avoid that), then fit the model using only the training sets and calculate the entire forecast horizon on the "fresh" calibration timeslices to get your scores (you call them alpha in the code). That way you would have separate distributions for each horizon h=1,2,3... so hopefully the resulting prediction interval would look like a trumpet.

Akai01 / caretForecast

non-conformity measures appear to not be calculated out-of-bag #1