femiguez / nlraa

Nonlinear Regression for Agricultural Applications
https://femiguez.github.io/nlraa-docs/index.html
22 stars 2 forks source link

predict_gnls() returns Error in quantile.default(newX[, i], ...) #22

Closed SimonShamusRiley closed 9 months ago

SimonShamusRiley commented 1 year ago

I am sometimes, but not always, getting a strange error when attempting to calculate confidence intervals using predict_gnls(). In the example which follows, there are measurements (log scale) of the initial microbial population (= Phase1), then again following a first treatment (= Phase2) and then again daily over a period of 5 days following a second treatment (= Phase3). My model includes dummy variables to encode the effects of the treatments 1 and 2, and a re-parameterized Weibull to model subsequent change over time. The model fits without warnings/errors, but predict_gnls() fails:

# load packages
library(nlme)
library(nlraa)

# create data set
microbe = structure(list(Trial = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 
                         2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3), 
               Log = c(6.86468696709014, 
                        6.17105526934298, 5.82899896281061, 4.83359505763915, 2.82512686438112, 
                        3.12472285973601, 1.72478353249313, 2.77005756550606, 7.0503785687999, 
                        6.35965377626497, 6.36580893512283, 3.45524250985477, 2.30239227201876, 
                        1.93998020793381, 1.97867697492427, 0.693060415349208, 6.99476701666496, 
                        6.52218511029435, 5.44873214790974, 3.42258607221712, 3.49805153133948, 
                        1.93550829917111, 3.64734428971361, 3.37482224260704), 
               DayNum = c(0, 0, 0, 1, 2, 3, 4, 5, 0, 0, 0, 1, 2, 3, 4, 5, 0, 0, 0, 1, 2, 3, 4, 5),
               Phase = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
                                   2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), 
                                 .Label = c("Phase1", "Phase2", "Phase3"), class = "factor"), 
               Trt1 = c(0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1), 
               Trt2 = c(0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 
                        0, 0, 1, 1, 1, 1, 1, 1)), 
               row.names = c(NA, -24L), class = "data.frame")

# fit the model
weibull_repar = gnls(Log ~ (N_0 - Trt1*phi1 - Trt2*phi2) - (DayNum^beta)*(5 - phi1 - phi2)/(T_5^beta),
                     data = microbe,
                     start = list(N_0 = 8, phi1 = 0.9,  phi2 = 0.6, beta = .4, T_5 = 5))
# model appears fine
intervals(weibull_repar)

# create predictions/confidence band for trend over time
new = data.frame(Trt1 = 1, Trt2 = 1, DayNum = seq(0, 5, .1))
predict_gnls(weibull_repar, newdata = new, interval = 'confidence') # Error in quantile.default(newX[, i], ...) missing values and NaN's not allowed if 'na.rm' is FALSE

Debugging indicates the issue appears somewhere on or around line 166 of predict_nlme.R, but that was as far as I got. I'm running version 1.5 of the package on R 4.1.3. Oddly, the issue doesn't occur with other, similar data sets even using the same model, so I'm at a loss as to what the issue is. Any help you could provide would be much appreciated. Thanks in advance.

femiguez commented 9 months ago

@SimonShamusRiley Sorry for the delay. I'm just now looking at this. The error is because the function returns NA/NaNs and in an effort to summarize the results the function 'quantile' fails. In my opinion, you are fitting a function that has too many parameters given the structure of the data. The 'intervals' show that they are not well estimated. For example, the simple exponential - plateau fits the data better.

me1 <- nls(Log ~ SSexpf(DayNum, a, c), data = microbe)
me2 <- nls(Log ~ SSexpfp(DayNum, a, c, xs), data = microbe)

ggplot(data = microbe,
       aes(DayNum, Log)) + geom_point() + 
  geom_line(aes(y = fitted(me1), color = "exp")) + 
  geom_line(aes(y = fitted(me2), color = "exp-p"))
femiguez commented 9 months ago

@SimonShamusRiley Bootstrapping is a better option in this case

### Bootstrapping
microb_bt <- boot_gnls(weibull_repar, fitted)

prds <- summary_simulate(t(na.omit(microb_bt$t)))
microbeA <- cbind(microbe, prds)

ggplot(data = microbeA,
       aes(DayNum, Log)) + 
  geom_point() + 
  geom_line(aes(y = fitted(weibull_repar))) + 
  geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5), alpha = 0.2)
femiguez commented 9 months ago

I'm closing this but feel free to open a new issue