Novartis / xgxr

R package for supporting exploratory graphics at http://opensource.nibr.com/xgx
Other
13 stars 7 forks source link

geom_smooth with method "nls" not producing desired behavior #40

Closed margoal1 closed 3 years ago

margoal1 commented 3 years ago

geom_smooth with method = "nls" does not produce desired behavior. The issue has been isolated to the definition of predictdf.nls function, which works when sourced to .GlobalEnv, or when compiled in the package with R 3.6.1 and ggplot2 3.2.1, but does not work when compiled within the package with R 4.0.2 and ggplot2 3.3.2. Details below.

Function predictdf.nls was created to provide mean and confidence intervals from a nls fit to supply to geom_smooth. Because of the naming of the predictdf.nls function, geom_smooth is supposed to call it automatically (similar to how geom_smooth calls predictdf.glm & predictdf.loess depending on the method). It was working perfectly fine for ggplot2 v3.2.1 / R v3.6.1, but somewhere between there and ggplot2 v3.3.2 / R v4.0.2 it’s now broken. Below are details on when it does and does not work.

Details on when it does and does not work:

There is some communication issue between the package-defined predictdf.nls function and the ggplot2 geom_smooth function which I don’t understand.

Below is the function and example code.

#' Prediction data frame for nls
#' 
#' Get predictions with standard errors into data frame for use with geom_smooth
#'
#' \code{ggplot2::geom_smooth} produces confidence intervals by silently calling functions 
#' of the form predictdf.method, where method is "loess", "lm", "glm" etc. 
#' depending on what method is specified in the call to \code{geom_smooth}. 
#' Currently \code{ggplot2} does not define a \code{predictdf.nls} function for method of type "nls", 
#' and thus confidence intervals cannot be automatically generated by \code{geom_smooth} 
#' for method = "nls". Here we define \code{predictdf.nls} for calculating the confidence 
#' intervals of an object of type nls. \code{geom_smooth} will silently call this function 
#' whenever method = "nls", and produce the appropriate confidence intervals.
#' 
#' \code{predictdf.nls} calculates CI for a model fit of class nls based on the "delta-method" 
#' http://sia.webpopix.org/nonlinearRegression.html#confidence-intervals-and-prediction-intervals)
#'
#' CI = [ f(x0, beta) + qt_(alpha/2, n - d) * se(f(x0, beta)),
#'       f(x0, beta) + qt_(1 - alpha/2, n - d) * se(f(x0, beta))]
#'
#' where:
#' beta = vector of parameter estimates
#' x = independent variable
#' se(f(x0, beta)) = sqrt( delta(f)(x0, beta) * Var(beta) * (delta(f)(x0, beta))' )
#' delta(f) is the gradient of f
#'
#' @param model nls object
#' @param xseq newdata
#' @param se Display confidence interval around smooth?
#' @param level Level of confidence interval to use
#'
#' @return dataframe with x and y values, if se is TRUE dataframe also includes ymin and ymax
#'
#' @importFrom Deriv Deriv
#' @importFrom stats nls
#' @export
predictdf.nls <- function(model, xseq, se, level) {

  # function to calculate gradient wrt model parameters
  # value is the function value
  # grad is the gradient
  fun_grad <- function(form, x, pars, se){

    # extract the model parameters to the local environment
    list2env(pars %>% as.list(), envir = environment())

    ret <- list()
    ret$value <- eval(form[[3L]]) # this is the value of the formula

    if(se){
      ret$grad <- list()
      xvec <- x      
      for(i in 1:length(xvec)){
        x = xvec[i]
        ret$grad[[i]] <- eval(Deriv::Deriv(form, names(pars), cache.exp = FALSE)) %>% as.list()
      }

      ret$grad <- dplyr::bind_rows(ret$grad) %>% as.matrix
    }

    return(ret)
  }

  fg <- fun_grad(form = model$m$formula(), x = xseq, pars = model$m$getPars(), se)

  f.new <- fg$value # value of function
  pred <- data.frame(x = xseq, y = f.new)

  if(se){
    grad.new <- fg$grad # value of gradient

    vcov <- vcov(model)
    GS = rowSums((grad.new%*%vcov)*grad.new)

    alpha = 1 - level
    deltaf <- sqrt(GS)*qt(1 - alpha/2, df = summary(model)$df[2])

    pred$ymin <- f.new - deltaf
    pred$ymax <- f.new + deltaf

  }else{
    pred <- data.frame(x = xseq, y = f.new)
  }

  return(pred)
}
# Example code
set.seed(123456)

Nsubj <- 10
Doses <- c(0, 25, 50, 100, 200)
Ntot <- Nsubj*length(Doses)
times <- c(0,14,30,60,90)

dat1 <- data.frame(ID = 1:(Ntot),
                   DOSE = rep(Doses, Nsubj),
                   E0 = 50*rlnorm(Ntot, 0, 0.3),
                   Emax = 100*rlnorm(Ntot, 0, 0.3),
                   ED50 = 50*rlnorm(Ntot, 0, 0.3)) %>%
    dplyr::mutate(Response = (E0 + Emax*DOSE/(DOSE + ED50))*rlnorm(Ntot, 0, 0.3)  ) %>%
    merge(data.frame(ID = rep(1:(Ntot), each = length(times)), Time = times), by = "ID") 

gg <- xgx_plot(data = dat1, aes(x = DOSE, y = Response))
gg <- gg + geom_point()
gg

gg + geom_smooth(method = "nls", 
                 formula = y ~ E0 + Emax*x/(ED50 + x),
                 method.args = list(start = list(E0 = 50, ED50 = 25, Emax = 150)))

In previous R & ggplot2 versions: image

with most recent R and ggplot2 versions:

Warning message: Computation failed in stat_smooth(): $ operator is invalid for atomic vectors

image

margoal1 commented 3 years ago

I found this sentence in the News for R v4.0.0, https://cran.r-project.org/doc/manuals/r-devel/NEWS.html:

S3 method lookup now by default skips the elements of the search path between the global and base environments.

Do you think this might be related? I think the predictdf.xxx functions are related to methods.... in ggplot2 the predictdf function calls UseMethod("predictdf") https://github.com/tidyverse/ggplot2/blob/master/R/stat-smooth-methods.r

margoal1 commented 3 years ago

Found the below advice from https://adv-r.hadley.nz/s3.html?q=UseMethod#s3-methods. The bolding is mine:

13.4.3 Creating methods There are two wrinkles to be aware of when you create a new method:

  • First, you should only ever write a method if you own the generic or the class. R will allow you to define a method even if you don’t, but it is exceedingly bad manners. Instead, work with the author of either the generic or the class to add the method in their code.

Maybe we should suggest predictdf.nls be added to ggplot2?

margoal1 commented 3 years ago

Solution: use @exportS3Method ggplot2::predictdf as in the reference below

https://stackoverflow.com/questions/61482561/whats-the-preferred-means-for-defining-an-s3-method-in-an-r-package-without-int

mattfidler commented 3 years ago

I'm glad you found it;

Interestingly the tidyverse doesn't follow their own advice with broom, they instead encourage others to write methods for linear models.

Found the below advice from https://adv-r.hadley.nz/s3.html?q=UseMethod#s3-methods. The bolding is mine:

13.4.3 Creating methods There are two wrinkles to be aware of when you create a new method:

  • First, you should only ever write a method if you own the generic or the class. R will allow you to define a method even if you don’t, but it is exceedingly bad manners. Instead, work with the author of either the generic or the class to add the method in their code.

Maybe we should suggest predictdf.nls be added to ggplot2?