Closed sedavid01 closed 1 year ago
Agree what we would like to do is use the mc5 table (and mc5_param) as input to calculate AUCs for any of the winning model curves. The issue presented by tcplfit2 is many more models, and so the way we had done this previously (for Hill and gain-loss only) is not applicable unless we either generalize the method to a trapz-like method, which is agnostic to the model fit, or include all curve-fitting models for calculation of the AUC.
Here is part of an example of how I have calculated this in the past for Hill and gain-loss curves that were in previous versions of tcpl (with only positive direction). The use.me parameter was something I generated as I implemented post hoc filtering on the mc5 data.
hill_curve <- function(hill_tp, hill_ga, hill_gw, lconc){ return(hill_tp/(1+10^((hill_ga - lconc)*hill_gw))) }
gnls_curve <- function(top, ga, gw, la, lw, lconc){ gain <- 1/(1+10^((ga - lconc)gw)) loss <- 1/(1+10^((lconc - la)lw)) return(topgainloss) }
mc5[use.me ==1L & modl == "hill", auc := mapply(function(lower, upper, hill_tp, hill_ga, hill_gw) integrate(hill_curve, lower, upper, hill_tp=hill_tp, hill_ga=hill_ga, hill_gw=hill_gw)$value, lower = mc5[use.me ==1L & modl == "hill", logc_min], upper = mc5[use.me ==1L & modl == "hill", logc_max], hill_tp = mc5[use.me ==1L & modl == "hill", hill_tp], hill_ga = mc5[use.me ==1L & modl == "hill", hill_ga], hill_gw = mc5[use.me ==1L & modl == "hill", hill_gw])]
mc5[hitc == 1L & use.me==1L & modl == "gnls", auc := mapply(function(lower, upper, top, ga, gw, la, lw) integrate(gnls_curve, lower, upper, top=top, ga=ga, gw=gw, la=la, lw=lw)$value, lower = mc5[use.me ==1L & modl == "gnls", logc_min], upper = mc5[use.me ==1L & modl == "gnls", logc_max], top = mc5[use.me ==1L & modl == "gnls", gnls_tp], ga = mc5[use.me ==1L & modl == "gnls", gnls_ga], gw = mc5[use.me ==1L & modl == "gnls", gnls_gw], la = mc5[use.me ==1L & modl == "gnls", gnls_la], lw = mc5[use.me ==1L & modl == "gnls", gnls_lw])]
Thank you @kpaulfriedman for providing additional information/code for Grace to use as a starting set of base code.
From conversation with Grace, steps/to do's for this task:
@gracezhihuizhao, when this ticket is ready for review and submission of PR, please make sure to choose 'dev' as the branch you want to merge changes into.
Add some "how to examples" in a new vignette - this will help to document how the function should be used and will aid in testing out whether the function behavior is working as intended. @gracezhihuizhao, feel free to reach out if you have any questions or concerns.
Check and make sure we are using the correct objective function Ran an example and got model parameters
conc <- c(.03, .1, .3, 1, 3, 10, 30, 100)
resp <- c(0, .1, 0, .2, .6, .9, 1.1, 1)
output <- tcplfit2_core(conc, resp, .8)
# print out model parameters
tp ga p la q er
1.023238 2.453007 1.592714 4288.993065 5.770323 -3.295309
# function in the package
loggnls = function(ps, x){
#gnls function with log units: x = log10(conc) and ga/la = log10(gain/loss ac50)
#tp = ps[1], ga = ps[2], p = ps[3], la = ps[4], q = ps[5]
gn <- 1/(1 + 10^((ps[2] - x)*ps[3]))
ls <- 1/(1 + 10^((x - ps[4])*ps[5]))
return(ps[1]*gn*ls )
}
# Katie's function
gnls_curve <- function(top, ga, gw, la, lw, lconc){
gain <- 1/(1+10^((ga - lconc)*gw))
loss <- 1/(1+10^((lconc - la)*lw))
return(top*gain*loss)
}
# calculate result
# Note: different parameter names: gw = p, lw = q
gnls_curve(top = 1.023238,
ga = log10(2.453007),
gw = 1.592714,
la = log10(4288.993065),
lw = 5.770323,
lconc = log10(conc))
# result from our model
# difference in digits can be due to rounding
output[["gnls"]][["modl"]]
[1] 0.0009191732 0.0062220721 0.0347918871 0.1977189829 0.5929386025 0.9246257848
[7] 1.0046151251 1.0204579929
[1] 0.0009191735 0.0062220737 0.0347918936 0.1977190132 0.5929387299 0.9246260667
[7] 1.0046154597 1.0204583395
@gracezhihuizhao, thank you for providing this code and the results. Can you estimate whether there is a difference in the AUC values between your new function and the code Katie originally posted?
# since I don't have access to the data table I will just plug in some numbers
# parameters, conc, ps are the same from the previous code above
mapply(function(lower,
upper,
top,
ga,
gw,
la,
lw) integrate(gnls_curve,
lower,
upper,
top=top,
ga=ga,
gw=gw,
la=la,
lw=lw)$value,
lower = min(log10(conc)), # need to plug in conc in log-scale
upper = max(log10(conc)), # need to plug in conc in log-scale
top = 1.023238,
ga = log10(2.453007),
gw = 1.592714,
la = log10(4288.993065),
lw = 5.770323)
get_AUC("gnls", min(conc), max(conc), ps) # conc in original unit because the function will convert conc internally for gnls
[1] 1.64823
[1] 1.64823
Overall, the demo code looks good so far. Couple quick comments:
- In the R package R project utilize the usethis::use_vignette(name = "
",title = " ") - Open the vignette you just created with the following step and copy over the code used in the .Rmd you already created. This will start as the base for our vignette and can be polished from there.
Comments on vignette:
check_AUC_poly2 <- function(a,b,x1,x2){ ((a/3)*x2^3 + (b/2)*x2^2) - ((a/3)*x1^3 + (b/2)*x1^2) }
check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 2) + check_AUC_poly2(a = 1,b = -2,x1 = 2,x2 = 3.2)
check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 3.2)
Comments on vignette:
- Missing the gain-loss model AUC calculation for the example with "all other models".
- I would double check the integration on the poly2 bi-phasic response AUC example. When I do the integration by hand I get 0.6826667, not 0.1706667.
check_AUC_poly2 <- function(a,b,x1,x2){ ((a/3)*x2^3 + (b/2)*x2^2) - ((a/3)*x1^3 + (b/2)*x1^2) }
check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 2) + check_AUC_poly2(a = 1,b = -2,x1 = 2,x2 = 3.2)
check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 3.2)
(a/b x + a x^2/b^2)
is different from normal poly2, my a = 1 and b = -2 when put into the function will become (1/4)x^2 - 1/2x
instead of x^2 -2x
. I will make this clear in the description in the vignette as well. I totally forgot the different parameterization of the poly2 function in tcplfit2 (silly mistake on my part)... For now, yes please make sure to just add a clarification/description of this in the vignette. Another thing we may want to do, maybe as an extension (vignette polishing type ticket - separate from here) is to include the underlying equations/mathematical calculations.
Thank you for the corrective note and adding those updates to your vignette.
@brown-jason and @kpaulfriedman, this ticket has been completed and branch merged into 'dev'.
Some of the users of the ToxCast pipeline and/or tcplfit2 like to calculate the area under the dose-response curve (AUC) to get an idea of the 'efficacy' of a given chemical perturbagen. We would like to add this functionality to the R package so users have a ready to use function available in tcplfit2 rather than having to do this calculation manually or write their own function.
Goal: Create a new function to calculate AUC.
Considerations: We are going to need to consider how to appropriately calculate the AUC given tcplfit2 can fit models bi-directionally. Also, @brown-jason and @kpaulfriedman if there are any other requirements for this ticket you would like to see please feel free to put those in the comments here. I assume we want the AUC from concentration/dose = 0 to the maximal concentration/dose in the concentration/dose range. (Note: An extension of this function may be possible in future development if need be.)