sedavid01 commented 1 year ago

Some of the users of the ToxCast pipeline and/or tcplfit2 like to calculate the area under the dose-response curve (AUC) to get an idea of the 'efficacy' of a given chemical perturbagen. We would like to add this functionality to the R package so users have a ready to use function available in tcplfit2 rather than having to do this calculation manually or write their own function.

Goal: Create a new function to calculate AUC.

Considerations: We are going to need to consider how to appropriately calculate the AUC given tcplfit2 can fit models bi-directionally. Also, @brown-jason and @kpaulfriedman if there are any other requirements for this ticket you would like to see please feel free to put those in the comments here. I assume we want the AUC from concentration/dose = 0 to the maximal concentration/dose in the concentration/dose range. (Note: An extension of this function may be possible in future development if need be.)

kpaulfriedman commented 1 year ago

Agree what we would like to do is use the mc5 table (and mc5_param) as input to calculate AUCs for any of the winning model curves. The issue presented by tcplfit2 is many more models, and so the way we had done this previously (for Hill and gain-loss only) is not applicable unless we either generalize the method to a trapz-like method, which is agnostic to the model fit, or include all curve-fitting models for calculation of the AUC.

Here is part of an example of how I have calculated this in the past for Hill and gain-loss curves that were in previous versions of tcpl (with only positive direction). The use.me parameter was something I generated as I implemented post hoc filtering on the mc5 data.

functions needed to fit

hill_curve <- function(hill_tp, hill_ga, hill_gw, lconc){ return(hill_tp/(1+10^((hill_ga - lconc)*hill_gw))) }

gnls_curve <- function(top, ga, gw, la, lw, lconc){ gain <- 1/(1+10^((ga - lconc)gw)) loss <- 1/(1+10^((lconc - la)lw)) return(topgainloss) }

fit all hitc==1 curves in the mc5

mc5[use.me ==1L & modl == "hill", auc := mapply(function(lower, upper, hill_tp, hill_ga, hill_gw) integrate(hill_curve, lower, upper, hill_tp=hill_tp, hill_ga=hill_ga, hill_gw=hill_gw)$value, lower = mc5[use.me ==1L & modl == "hill", logc_min], upper = mc5[use.me ==1L & modl == "hill", logc_max], hill_tp = mc5[use.me ==1L & modl == "hill", hill_tp], hill_ga = mc5[use.me ==1L & modl == "hill", hill_ga], hill_gw = mc5[use.me ==1L & modl == "hill", hill_gw])]

mc5[hitc == 1L & use.me==1L & modl == "gnls", auc := mapply(function(lower, upper, top, ga, gw, la, lw) integrate(gnls_curve, lower, upper, top=top, ga=ga, gw=gw, la=la, lw=lw)$value, lower = mc5[use.me ==1L & modl == "gnls", logc_min], upper = mc5[use.me ==1L & modl == "gnls", logc_max], top = mc5[use.me ==1L & modl == "gnls", gnls_tp], ga = mc5[use.me ==1L & modl == "gnls", gnls_ga], gw = mc5[use.me ==1L & modl == "gnls", gnls_gw], la = mc5[use.me ==1L & modl == "gnls", gnls_la], lw = mc5[use.me ==1L & modl == "gnls", gnls_lw])]

sedavid01 commented 1 year ago

Thank you @kpaulfriedman for providing additional information/code for Grace to use as a starting set of base code.

sedavid01 commented 1 year ago

From conversation with Grace, steps/to do's for this task:

Need to be able to calculate the AUC for any of the models available in tcplfit2 (likely exclude the constant model)
Baseline design for this function can be done one of two ways: (1) house all of the AUC calculation code in under one function, or (2) create an AUC function for each individual model in tcplfit2 then have a wrapper function which 'links-out'/'calls' the individual model function when necessary.
General overarching function should have the following arguments are lower x bound, upper x bound, model, model parameters. (Assume for now we just want the AUC for a single model - context mc5 would have the winning model results so we would want the AUC just for the winning models, so tcpl would pass the winning model name to model and the respective parameters to model parameters.)
Start with calculations for the AUC for just increasing response curve fits, then we can move on to the more challenging decreasing response curve fits.

sedavid01 commented 1 year ago

@gracezhihuizhao, when this ticket is ready for review and submission of PR, please make sure to choose 'dev' as the branch you want to merge changes into.

sedavid01 commented 1 year ago

Add some "how to examples" in a new vignette - this will help to document how the function should be used and will aid in testing out whether the function behavior is working as intended. @gracezhihuizhao, feel free to reach out if you have any questions or concerns.

gracezhihuizhao commented 1 year ago

Check and make sure we are using the correct objective function Ran an example and got model parameters

conc <- c(.03, .1, .3, 1, 3, 10, 30, 100)
resp <- c(0, .1, 0, .2, .6, .9, 1.1, 1)
output <- tcplfit2_core(conc, resp, .8)
# print out model parameters 
      tp          ga           p          la           q          er 
   1.023238    2.453007    1.592714 4288.993065    5.770323   -3.295309

Implementation

# function in the package 
loggnls = function(ps, x){
  #gnls function with log units: x = log10(conc) and ga/la = log10(gain/loss ac50)
  #tp = ps[1], ga = ps[2], p = ps[3], la = ps[4], q = ps[5]
  gn <- 1/(1 + 10^((ps[2] - x)*ps[3]))
  ls <- 1/(1 + 10^((x - ps[4])*ps[5]))
  return(ps[1]*gn*ls )
}

# Katie's function 
gnls_curve <- function(top, ga, gw, la, lw, lconc){
    gain <- 1/(1+10^((ga - lconc)*gw))
    loss <- 1/(1+10^((lconc - la)*lw))
    return(top*gain*loss)
}

Compare Results

# calculate result
# Note: different parameter names: gw = p, lw = q
gnls_curve(top = 1.023238, 
           ga = log10(2.453007), 
           gw = 1.592714,
           la = log10(4288.993065),
           lw = 5.770323,
           lconc = log10(conc))

# result from our model
# difference in digits can be due to rounding
output[["gnls"]][["modl"]]

[1] 0.0009191732 0.0062220721 0.0347918871 0.1977189829 0.5929386025 0.9246257848
[7] 1.0046151251 1.0204579929
[1] 0.0009191735 0.0062220737 0.0347918936 0.1977190132 0.5929387299 0.9246260667
[7] 1.0046154597 1.0204583395

sedavid01 commented 1 year ago

@gracezhihuizhao, thank you for providing this code and the results. Can you estimate whether there is a difference in the AUC values between your new function and the code Katie originally posted?

gracezhihuizhao commented 1 year ago

# since I don't have access to the data table I will just plug in some numbers
# parameters, conc, ps are the same from the previous code above 

mapply(function(lower,
upper,
top,
ga,
gw,
la,
lw) integrate(gnls_curve,
lower,
upper,
top=top,
ga=ga,
gw=gw,
la=la,
lw=lw)$value,
lower = min(log10(conc)), # need to plug in conc in log-scale
upper = max(log10(conc)),  # need to plug in conc in log-scale
top = 1.023238,
ga = log10(2.453007),
gw = 1.592714,
la = log10(4288.993065),
lw = 5.770323)

get_AUC("gnls", min(conc), max(conc), ps) # conc in original unit because the function will convert conc internally for gnls

[1] 1.64823
[1] 1.64823

sedavid01 commented 1 year ago

Overall, the demo code looks good so far. Couple quick comments:

Please move the getAUC .Rmd and .pdf you have from the package to a local working directory.
Create a vignette for demonstrating the get_AUC function. Steps:

In the R package R project utilize the usethis::use_vignette(name = "",title = "")</li> <li>Open the vignette you just created with the following step and copy over the code used in the .Rmd you already created. This will start as the base for our vignette and can be polished from there.</li> </ul> </blockquote> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/sedavid01"><img src="https://avatars.githubusercontent.com/u/78621036?v=4" />sedavid01</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>Comments on vignette:</p> <ul> <li>Missing the gain-loss model AUC calculation for the example with "all other models".</li> <li>I would double check the integration on the poly2 bi-phasic response AUC example. When I do the integration by hand I get 0.6826667, not 0.1706667. <code>check_AUC_poly2 <- function(a,b,x1,x2){ ((a/3)*x2^3 + (b/2)*x2^2) - ((a/3)*x1^3 + (b/2)*x1^2) }</code> <code>check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 2) + check_AUC_poly2(a = 1,b = -2,x1 = 2,x2 = 3.2)</code> <code>check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 3.2)</code></li> </ul> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/gracezhihuizhao"><img src="https://avatars.githubusercontent.com/u/141073347?v=4" />gracezhihuizhao</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <blockquote> <p>Comments on vignette:</p> <ul> <li>Missing the gain-loss model AUC calculation for the example with "all other models".</li> <li>I would double check the integration on the poly2 bi-phasic response AUC example. When I do the integration by hand I get 0.6826667, not 0.1706667. <code>check_AUC_poly2 <- function(a,b,x1,x2){ ((a/3)*x2^3 + (b/2)*x2^2) - ((a/3)*x1^3 + (b/2)*x1^2) }</code> <code>check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 2) + check_AUC_poly2(a = 1,b = -2,x1 = 2,x2 = 3.2)</code> <code>check_AUC_poly2(a = 1,b = -2,x1 = 0,x2 = 3.2)</code></li> </ul> </blockquote> <ol> <li>I will add gnls to that example</li> <li>Sorry if the parameter is a little confusing here - because how poly2 is implemented in the package <code>(a/b x + a x^2/b^2)</code> is different from normal poly2, my a = 1 and b = -2 when put into the function will become <code>(1/4)x^2 - 1/2x</code>instead of <code>x^2 -2x</code>. I will make this clear in the description in the vignette as well. </li> </ol> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/sedavid01"><img src="https://avatars.githubusercontent.com/u/78621036?v=4" />sedavid01</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>I totally forgot the different parameterization of the poly2 function in tcplfit2 (silly mistake on my part)... For now, yes please make sure to just add a clarification/description of this in the vignette. Another thing we may want to do, maybe as an extension (vignette polishing type ticket - separate from here) is to include the underlying equations/mathematical calculations.</p> <p>Thank you for the corrective note and adding those updates to your vignette.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/sedavid01"><img src="https://avatars.githubusercontent.com/u/78621036?v=4" />sedavid01</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>@brown-jason and @kpaulfriedman, this ticket has been completed and branch merged into 'dev'.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>

USEPA / CompTox-ToxCast-tcplFit2

Add in a function for calculating AUC for dose-response curves #12

functions needed to fit

fit all hitc==1 curves in the mc5

Implementation

Compare Results