DoseResponse / drc

Fitting dose-response models in R
https://doseresponse.github.io/drc/
21 stars 16 forks source link

Mdofied code for time-to-event models (drm.R function) #8

Closed OnofriAndreaPG closed 5 years ago

OnofriAndreaPG commented 5 years ago

In order to make DRC work with thermal time models I had to make a few changes to the code. Indeed, it appeared that the original coding for time-to-event studies could not work with functions characterised by an additional predictor, apart from the two variables coding for the start and end of time intervals. Please note that the original coding does not appear to work also with models with multiple curves (i.e. where curveid is used).

I made the following changes.

Change 1 - module 'drm.r' around line 1299 (original) 1320 (modified)

The original code was specific to a situation with only two predictors (start, end) and there was the need to allow for one further predictor.

    ## Optimising the objective function previously defined
    startVecSc <- as.vector(startVecSc)  # removing names
    nlsFit <- drmOpt(opfct, opdfct1, startVecSc, optMethod, constrained, warnVal, 
    upperLimits, lowerLimits, errorMessage, maxIt, relTol, parmVec = parmVec, traceVal = control$"trace",
    matchCall = callDetail, silentVal = control$"otrace") 
#    matchCall = match.call()) 

    if (!nlsFit$convergence) {return(nlsFit)}

    if (identical(type, "event"))
    {
#        dose <- dose[isFinite, 2]
#        resp <- (as.vector(unlist(tapply(resp, assayNo, function(x){cumsum(x) / sum(x)}))))[isFinite]

#        orderDose <- order(dose0)
#        dose1 <- dose0[orderDose]

        assayNo0 <- assayNo[isFinite]

        #dose0 <- dose[, 2]
        dose0 <- dose[, -1]

        #Ok se dose0 is vector
        #Also Create an id for experimental units (Petri dishes or other)
        if(is.vector(dose0)==T){
          dose <- dose0[is.finite(dose0)==T]
          dose1 <- dose0[is.finite(dose0)==T] 
          dose <- as.vector(unlist(tapply(dose1, assayNo0, function(x){unique(sort(x))})))
          Dish <- c(); cont <- 1
          for(i in 1:length(dose0)) {Dish[i] <- cont; if(is.finite(dose0[i]) == F ) cont <- cont+1}
        }else{

          dose <- dose0[is.finite(dose0[,1])==T,]
          dose1 <- dose0[is.finite(dose0[,1])==T,] 
          dose <- dose[order(dose[,1]),]
          Dish <- c(); cont <- 1
          for(i in 1:length(dose0[,1])) {Dish[i] <- cont; if(is.finite(dose0[i,1]) == F ) cont <- cont+1}
        }

        #dose <- dose[is.finite(dose[,2]),] 

        #This line does not work properly
        #dose <- as.vector(unlist(tapply(dose1[,1], assayNo0, function(x){unique(sort(x))})))

        ......
        ...........

Change 2 - module 'drm.r' around line 1312 (original) 1355 (modified)

The original code, when 'curveid' is used to fit multiple curves, produces the vector 'resp', hosting the observed proportions, calculated from the observed counts. This code does not work properly when each curve contains replicated observational units (e.g. Petri dishes). This is peculiar to germination assays and, in this case, 'resp' contains the proportions, pooled across replicates. If the design is balanced, everything runs smootly; on the contrary, if the design is unbalanced, a warning message is raised around lines 1570, when residuals are calculated, because individual predictions ('predVec') and responses ('resp') do not have the same lengths. I corrected the code as follows.

## Rescaling per curve id
        idList <- split(data.frame(dose0, resp), assayNo)
        idList2 <- split(data.frame(dose0, resp), Dish)

        respFct <- function(idListElt)
        { 
            doseVec <- idListElt[, 1]
            dose2 <- unique(sort(doseVec))
            orderDose <- order(doseVec)
            resp1 <- tapply(idListElt[orderDose, 2], doseVec[orderDose], sum)  # obtaining one count per time interval
            resp2 <- cumsum(resp1) / sum(resp1)
            cbind(dose2, resp2)[is.finite(dose2), , drop = FALSE]
        }

        #This functions here do not work properly with replicates.
        drList <- lapply(idList, respFct)
        lapList <- lapply(drList, function(x){x[, 1]})

        drList2 <- lapply(idList2, respFct)
        lapList2 <- lapply(drList2, function(x){x[, 1]})

        #dose <- as.vector(unlist(lapList))
        #Here resp and resp2 represent respectively the predictions 
        # for the means (if there are replicates per each AssayNo) 
        # and the individual responses
        resp <- as.vector(unlist(lapply(drList, function(x){x[, 2]})))
        resp2 <- as.vector(unlist(lapply(drList2, function(x){x[, 2]})))

As you see, there are now two vectors 'resp' and 'resp2'. The second one contains the proportions per each Petri dish. However, I assumed (see the vector Dish), that the observations are sorted by Petri Dish and by time in each Petri dish. May be a more general solution can be found.

Change 3 - module drm.r, line 1609 (original) 1668 (modified)

The above defined vector 'resp2' is used to calculate the residuals:

    ## Computation of fitted values and residuals
    if (identical(type, "event"))
    {   dose <- dose1
        multCurves2 <- modelFunction(dose, parm2mat, drcFct, cm, assayNoOld, upperPos, fct$"retFct", doseScaling, respScaling, isFinite)
        #predVec <- multCurves2(dose[,-1], fixedParm)
        predVec <- multCurves2(dose, fixedParm)
        resVec <- resp2 - predVec  # Corrected here #############################
        resVec[is.nan(predVec)] <- 0    
        #diagMat <- matrix(c(predVec, resVec), length(dose[,2]), 2)
        diagMat <- matrix(c(predVec, resVec), length(dose), 2)
        colnames(diagMat) <- c("Predicted values", "Residuals")
        #print(diagMat)
        }
    else{ 

Change 4 - module 'drm.r' around line 760 (original) 760 (modified)

The rutine that seeks for self-starting values could only deal with two predictors (start, end). I made the following changes, although now the code requires that the predictors are sorted in the following order: (1) start, (2) end, (3) any other predictor. Another problem here is that the function does not consider the possibility that, when 'curveid' is used, there are replicates for each level of curveid (as above). In this case I corrected it by assuming that the replicates are sorted one after the other, which may be a limitation (see above)!

        if (identical(type, "event"))
        {
             #dr2 <- doseresp[, 3]  #Need to take rightmost column as the response
            dr2 <- doseresp[, length(doseresp[1,])]
            isFinite <- is.finite(doseresp[, 2])
            respVec <- rep(NA, length(dr2))
            #respVec[isFinite] <- cumsum(dr2[isFinite]) / sum(dr2) #This works only with one curve
            #This cumulative proportion needs to be built for each Petri Dish
            #We need to assume that data are ordered by Time....
            Dish <- c(); cont=1
            for(i in 1:length(dr2)) {Dish[i] <- cont; if(is.finite(doseresp[i,2]) == F ) cont <- cont+1}
            total <- ( tapply(dr2, list(as.factor(Dish)), sum ) )
            number <- ( tapply(dr2, list(as.factor(Dish)), length ) )
            divideBy <- rep(total, each=number)
            respVec[isFinite] <- unlist( tapply(dr2[isFinite], Dish[isFinite], cumsum ) )/divideBy[isFinite]

#            doseresp[, 3] <- cumsum(dr2[isFinite]) / sum(dr2)
##            doseresp[!is.finite(doseresp[, 2]), 1] <- NA              
#            doseresp <- doseresp[isFinite, c(1, 3)]
#            names(doseresp) <- c("x", "y")
            nCols <- c(2:(length(doseresp[1,])-1))
            doseresp <- (data.frame(x = doseresp[, nCols], y = respVec))[isFinite, ]
        } else {
            isFinite <- is.finite(doseresp[, 2])
        }

Change 5 - module 'drm.r' around line 1333 (original) 1387 (modified)

The original code does not plot properly, whene there are multiple curves. Indeed, the 'plotit' variable contains a reduced number of observations, referring to the means across 'curveid'. Therefore I had to create a 'plotit2' variable, that hosts the correct id for the different curves.

splitFactor <- factor(assayNo, exclude = NULL)
        listCI <- split(splitFactor, splitFactor)
        lenVec <- as.vector(unlist(lapply(lapList, length)))
        plotid <- as.factor(as.vector(unlist(mapply(function(x,y){x[1:y]}, listCI, lenVec))))
        plotid2 <- as.factor(as.vector(unlist(listCI))[is.finite(dose0)])
#        plotid <- plotid[complete.cases(plotid)]
        levels(plotid) <- unique(assayNoOld)
        levels(plotid2) <- unique(assayNoOld)
    } else {
        plotid <- NULL
    }

Change 6 - module 'drm.r' around line 1764 (original) 1840 (modified)

In order to obtain the correct plots, resp2 (see above at Change 3) and plotid2 need to replace resp and plotid in the dataList object.

    if (identical(type, "event"))
    {
        dataList <- list(dose = dose, origResp = resp2, weights = wVec[isFinite], 
        curveid = assayNoOld[isFinite], plotid = plotid2, resp = resp2,
        names = list(dName = varNames[1], orName = varNames[2], wName = wName, cNames = anName, rName = ""))
    }