USEPA / CompTox-ToxCast-tcpl

US EPA's Toxicity Forecaster (ToxCast) Pipeline. More information on the ToxCast program available here: https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast
https://cran.r-project.org/package=tcpl
Other
29 stars 12 forks source link

tcplSubsetChid overwrites hitc to boolean #189

Open cthunes opened 8 months ago

cthunes commented 8 months ago

https://github.com/USEPA/CompTox-ToxCast-tcpl/blob/dev/R/tcplSubsetChid.R#L101

dat[, hitc := hitc >= .9]

Is this behavior desired or should it be in a new column like hitc_bool?

Kelly-Carstens-EPA commented 8 months ago

In addition to the Boolean output issue, there also may be an error in the logic:

dat[, chit := mean(hitc[hitc %in% 0:1]) >= 0.5, by = list(aeid, chid)]

In the example below, there were four spids for a unique aeid/chid. The mean was 0.25 (1/4 hits positive) and therefore would not be considered active. My thinking was that if there was any single active spid/aeid/chid, then the tcplsubsetchid() would by default capture this as an active? Could use min() instead of mean() if this is the appropriate logic. If we want 1/4 hits to be considered inactive, then no change needed.

From 'invitrodb' 1/3/2024: mc5 <- tcplPrepOtpt(tcplLoadData(lvl=5,type='mc', fld='aeid',val=2506)) dat <- mc5[chid == 20006] dat[,hitc2 := ifelse(hitc >= 0.9,1,0)] # 1 hit out of 4 spids mc5.sub <- tcplSubsetChid(dat) mc5.sub$hitc #FALSE, i.e. not considered active despite 1/4 hits

madison-feshuk commented 5 months ago

tcplSubsetChid overwriting hitc is causing an issue in the initial data pulls with v4.2 QC new.mc5 <- tcplPrepOtpt(tcplSubsetChid(tcplLoadData(lvl=5, type = 'mc', add.fld = TRUE)))

madison-feshuk commented 4 months ago

Try updating hitc as actc in lines 101(mc) and 147(sc)