dajmcdon / rtestim

https://dajmcdon.github.io/rtestim/
Other
4 stars 0 forks source link

Return errors in `delay_calculator()` due to empty subsets for CV #59

Closed jiapivialiu closed 3 months ago

jiapivialiu commented 4 months ago

Please try the following code to reproduce the error.

length <- 50
## piecewise constant Rt: 
Rt <- c(rep(2, floor(2 * length / 5)), rep(0.8, length - floor(2 * length / 5)))
Mu <- double(length); Mu[1] <- 2
incidence <- double(length)
w <- double(length)
## gamma parameters of measles: 
gamma_pars <- c(14.5963, 1.0208) 

set.seed(310)
incidence[1] <- Mu[1]
for(t in 2:length){
  w[t] <- delay_calculator(incidence[1:t], dist_gamma = gamma_pars)[t]
  Mu[t] <- Rt[t] * w[t]
  incidence[t] <- rpois(1, Mu[t])
}
w[1] <- w[2]

korder <- 0
# try it for multiple times with different random seeds
cv_mod <- rtestim::cv_estimate_rt(
  observed_counts = incidence,
  korder = korder, nfold = 10,
  nsol = 50, maxiter = 3e7L, 
  dist_gamma = gamma_pars,
  error_measure = "deviance",
  lambda_min_ratio = 1e-6,
  x = 1:length, delay_distn = NULL,
  delay_distn_periodicity = NULL
)
Rt_fitted <- cv_mod$full_fit$Rt[, which.min(cv_mod$cv_scores)]
Rt_fitted

The error in delay_calculator() at rtestim/R/cv_estimate_rt.R:126:5 says: ! The minimum spacing in xout must be at least delay_distn_periodicity. ℹ delay_distn_periodicity = 1 compared to 0 for xout. Run rlang::last_trace() to see where the error occurred. Warning messages: 1: In min(xout) : no non-missing arguments to min; returning Inf 2: In max(xout) : no non-missing arguments to max; returning -Inf

The empty subsets of samples makes the diff(xout) = 0 in line 74. We should drop the empty subsets and throw a warning in cv_estimate_rt(). Here is the solution:

# under line 93, add the warning info if there are nonempty subsets: 
if (length(unique(middle_fold)) < nfold)
        cli::cli_warn("Number of randomly generated folds is less than `nfold`.")
# in the for loop, iterate through non-empty subsets: 
for (f in unique(middle_fold))
# drop NAs in the summary table after computation in the for loop: 
cvall <- cvall[unique(middle_fold), !index]
# correct the number of total folds in the calculation of `cv_se` in the end:
sqrt(nrow(cvall)) # divided by it