Open bhaibeka opened 3 years ago
looks like the intersection on concentrations causes the error
I am currently working on updating this method. Expect a fix shortly.
Best, Chris
@bhaibeka I agree that it is a bug, in that it should fail more gracefully than this. The method fundamentally does not work on datasets with replicates, since we need 1-1 mapping of experiments across studies.
The only way I can think of salvaging is to first "collapse replicates", and then allow intersection. What are your thoughts on this?
I would collapse the replicate and add a clear warning to inform the users
So with the introduction of the LongTable
as the class which stores sensitivity profiles, I hadn't intended to subset on dose (concentration) at all in intersect-methods. They would instead subset on features, samples and compounds (at least at the PSet level). The user could then extract the LongTable
and select whichever dose they like with the very straightforward subset methods.
I can think of lots of situations where the dose is non even comparable. For example, if two PSets use different dose units. If the magnitude of those units differs (i.e., micromolar vs molar). Do we have standards in place around dose formatting?
Do you think preserving the intersection on dose is important despite these limitations? Or is it fine to have that as a secondary operation done by the user on the LongTable
.
From a design perspective, I think the problem with a lot of our methods is that they try to do everything. Programming best practices indicate a function should do one operation and users are responsible for composing them into a pipeline. This is even easier now in R 4.1 with the base R pipe |>
.
So what I imagine would look like:
intersect(PSet1, PSet) |> subsetLongTable(.(dose1 == dose2, , by=.(drug1id, drug2id)), )
Im not exactly sure I understand what this subsetLongTable function is doing in this case.
I am 100% for making the collapsing of replicates within a PSet and the intersection of PSets into two (or more) different steps, because I think their is a lot of choices on how to collapse replicates that are possible
In this case, subsetLongTable
is a wrapper function around the subset
method for a LongTable that works on a PSet. So you can subset a LongTable inside a PSet without extracting it.
If you mean the groupby, I'm selecting instances where the doses are equal for each drug combination.
setMethod('subsetLongTable', signature(x='PSet'), function(x, subset, select) {
LT <- subset(sensitivitySlot(x), substitute(subset), substitute(select))
sensitivitySlot(x) <- LT
harmonize(x) # drop drugs from the rest of the PSet if they are lost in the subset
return(x)
}
Or something like that.
As for collapsing replicates. There is already a method like that for MutliAssayExperiment
. For LongTable we would do it through an assay aggregate and let the backpressure take care of the rest (i.e., when you subset an assay, if you lose a drug or sample, it is automatically dropped from the LongTable row- or colData.
PharmacoGx version 2.4.0
CCLE vs GDSC1 (similar to Haibe-Kains Nature 2013)
ccle <- PharmacoGx::downloadPSet("CCLE_2015") gdsc1 <- PharmacoGx::downloadPSet("GDSC_2020(v1-8.2)")
ccle.gdsc1 <- PharmacoGx::intersectPSet(pSets=list("CCLE"=ccle, "GDSC1"=gdsc1))
returns
Intersecting large PSets may take a long time ... Error in which.min(abs(as.numeric(doses[[i]]) - min.dose)):max(which(abs(as.numeric(doses[[i]]) - : argument of length 0 In addition: Warning messages: 1: In min(as.numeric(doses[[i]]), na.rm = TRUE) : no non-missing arguments to min; returning Inf 2: In max(as.numeric(doses[[i]]), na.rm = TRUE) : no non-missing arguments to max; returning -Inf 3: In min(abs(as.numeric(doses[[i]]) - max.dose), na.rm = TRUE) : no non-missing arguments to min; returning Inf 4: In max(which(abs(as.numeric(doses[[i]]) - max.dose) == min(abs(as.numeric(doses[[i]]) - : no non-missing arguments to max; returning -Inf