USEPA / CompTox-ToxCast-tcpl

US EPA's Toxicity Forecaster (ToxCast) Pipeline. More information on the ToxCast program available here: https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast
https://cran.r-project.org/package=tcpl
Other
32 stars 15 forks source link

Dose response curves #8

Closed ldecicco-USGS closed 4 years ago

ldecicco-USGS commented 4 years ago

I've updated my script for creating custom dose response curve files. There is still one filter that I can't figure out that I think would assure the plots match up with the plots on https://comptox.epa.gov/dashboard.

Here is my current script:

cas <- "94125-34-5"
ep <- "ATG_PXRE_CIS_up"
chem_info <- tcplLoadChem(field = 'casn', val = cas)
assay_info <- tcplLoadAeid(fld = "aenm", val = ep)

mc4 <- tcplLoadData(lvl = 4, type = "mc", 
                    fld = c("spid", "aeid"), 
                    val = list(chem_info$spid,
                               assay_info$aeid))
mc5 <- tcplLoadData(lvl = 5L, type = "mc", 
                    fld = "m4id", val = mc4)

mc4_id <- mc5$m4id[which(mc5$hitc == 1)]
index <- which(mc4$m4id %in% mc4_id)

tcplPlotM4ID(mc4[index,], lvl = 5)

Calling the mc5 table lets me filter for a hitc of 1, but in this case still leaves me with 3 plots. I've been told "use the index where gsid_rep=1". I can see the "gsid_rep" column in the ToxCast "INVITRODB_V3_2_LEVEL5" csv files. In those csv files, there are 90 columns. In the "mc5" data frame from the tcplLoadData function, there are 68 columns (but otherwise the ones that are there match up).

So my question is...is there a to filter the mc4 data to just gsid_rep=1? I'm trying to find a document to figure out what the official/long name for the gsid_rep column is, but so far haven't found it, but I'll comment back on this issue if I do.

What I'm trying to reproduce: https://comptox.epa.gov/dashboard/dsstoxdb/results?search=94125-34-5#invitrodb-bioassays-toxcast-tox21

brown-jason commented 4 years ago

Laura, please look at function tcplSubsetChid to find the representative sample per chemical. That should reduce the number of plots from 3 to 1 and should match the one that is shown on the dashboard.

From the function description: tcplSubsetChid subsets level 5 data to a single tested sample per chemical. In other words, if a chemical is tested more than once (a chid has more than one spid) for a given assay endpoint, the function uses a series of logic to select a single "representative" sample.

ldecicco-USGS commented 4 years ago

Awesome, a 7 total minute response time! It just took me longer than that to re-connect to the remote computer I've got the database on. That's what I needed thanks!!!!

For my record (DON'T miss the tcplPrepOtpt setup 😬), here's what's working:

cas <- "94125-34-5"
ep <- "ATG_PXRE_CIS_up"
chem_info <- tcplLoadChem(field = 'casn', val = cas)
assay_info <- tcplLoadAeid(fld = "aenm", val = ep)

mc4 <- tcplLoadData(lvl = 4, type = "mc", 
                    fld = c("spid", "aeid"), 
                    val = list(chem_info$spid,
                               assay_info$aeid))
mc5 <- tcplLoadData(lvl = 5L, type = "mc", 
                    fld = "m4id", val = mc4)
mc5 <- tcplPrepOtpt(mc5)
mc5 <- tcplSubsetChid(dat = mc5, flag = FALSE)

mc4_id <- mc5$m4id[which(mc5$hitc == 1)]
index <- which(mc4$m4id %in% mc4_id)

tcplPlotM4ID(mc4[index,], lvl = 5)