ccagc / QDNAseq

QDNAseq package for Bioconductor
47 stars 27 forks source link

poolRuns() error or misapplication #124

Closed jppmatos closed 4 months ago

jppmatos commented 7 months ago

I'm trying to pool toghter several samples with poolRuns(), unfortunatelly can't have samples with identical names so they can be pooled.

I'm not sure if just I'm making some mistake, or there is a bug in poolRuns():

QDNAseq version: 1.34.0

> library(QDNAseq)

> bins <- getBinAnnotations(binSize=100)

> readCounts <- binReadCounts(bins, path = "S_4047",bamnames=c("S40","S40_Rerun","S47","S47_Rerun"))

> pooledReadCounts <- poolRuns(readCounts,c("S40","S40","S47","S47"))

Error in colMeans2(oldphenodata, cols = numericCols, useNames = FALSE) :
  Argument 'x' must be a matrix or a vector.

(post edited, had a typo (4th row) unrelated to the problem)

Thank you for your time, Zé Pedro

HenrikBengtsson commented 7 months ago

What's the traceback() output when you get this error?

jppmatos commented 7 months ago
> pooledReadCounts <- poolRuns(readCounts,c("S40","S40","S47","S47"))
Error in colMeans2(oldphenodata, cols = numericCols, useNames = FALSE) :
  Argument 'x' must be a matrix or a vector.

> traceback()
3: colMeans2(oldphenodata, cols = numericCols, useNames = FALSE)
2: poolRuns(readCounts, c("S40", "S40", "S47", "S47"))
1: poolRuns(readCounts, c("S40", "S40", "S47", "S47"))
HenrikBengtsson commented 7 months ago

Thanks. I cannot promise looking into this any time soon, but the more information (like this) we have upfront, the lower the threshold is for someone else to look into to this and maybe try to provide a patch.

HenrikBengtsson commented 7 months ago

I'm not sure if just I'm making some mistake, or there is a bug in poolRuns(): ...

Hard to tell right now, but I suspect this is data driven, e.g. there might be missing values, or too few data points in the data set. If that's the case, at least we could try to detect this in QDNAseq and give a more informative error message.

HenrikBengtsson commented 7 months ago

Two comments:

First, can you try to identify a minimal subset of samples that produces the error, e.g. does:

> readCounts <- binReadCounts(bins, path = "S_4047", bamnames=c("S40","S40_Rerun"))
> pooledReadCounts <- poolRuns(readCounts,c("S40","S40"))

produce the same error? What about going down to a single sample?

Doing this will help narrow in on the problem.

Second, please call:

> trace(matrixStats::colMeans2, tracer = quote({ message("Input data to colMeans():"); utils::str(x) }))

first, and then retry. It adds debug messages to colMeans2() that should show up just before the error, something like:

Input data to colMeans():
 int [1:3, 1] 1 2 3

That will help understand what's going on too.

jppmatos commented 7 months ago

First, can you try to identify a minimal subset of samples that produces the error, e.g. does:

> readCounts <- binReadCounts(bins, path = "S_4047", bamnames=c("S40","S40_Rerun"))
> pooledReadCounts <- poolRuns(readCounts,c("S40","S40"))

produce the same error? What about going down to a single sample?

| Trying only with S40 and S40R samples:

> readCounts <- binReadCounts(bins, path = "S40_sWGS", bamnames=c("S40","S40_Rerun"))
    S40 (1 of 2): extracting reads ... binning ...
    S40_Rerun (2 of 2): extracting reads ... binning ...
> pooledReadCounts <- poolRuns(readCounts,c("S40","S40"))
Error in colMeans2(oldphenodata, cols = numericCols, useNames = FALSE) :
  Argument 'x' must be a matrix or a vector.
> traceback()
3: colMeans2(oldphenodata, cols = numericCols, useNames = FALSE)
2: poolRuns(readCounts, c("S40", "S40"))
1: poolRuns(readCounts, c("S40", "S40"))

| Did the same with only S47 and S47R samples:

> readCounts <- binReadCounts(bins, path = "S47_sWGS", bamnames=c("S47","S47_Rerun"))
    S47 (1 of 2): extracting reads ... binning ...
    S47_Rerun (2 of 2): extracting reads ... binning ...
> pooledReadCounts <- poolRuns(readCounts,c("S47","S47"))
Error in colMeans2(oldphenodata, cols = numericCols, useNames = FALSE) :
  Argument 'x' must be a matrix or a vector.
> traceback()
3: colMeans2(oldphenodata, cols = numericCols, useNames = FALSE)
2: poolRuns(readCounts, c("S47", "S47"))
1: poolRuns(readCounts, c("S47", "S47"))

| If try one sample only:

> pooledReadCounts <- poolRuns(readCounts[,1],c("S47"))
>

No error

| And giving distinct names:

> pooledReadCounts <- poolRuns(readCounts,c("S47","S47R"))
>

Also no error.


Second, please call:

> trace(matrixStats::colMeans2, tracer = quote({ message("Input data to colMeans():"); utils::str(x) }))

first, and then retry. It adds debug messages to colMeans2() that should show up just before the error, something like:

Input data to colMeans():
 int [1:3, 1] 1 2 3

That will help understand what's going on too.

| Trying to pool with the S47 samples:

> trace(matrixStats::colMeans2, tracer = quote({ message("Input data to colMeans():"); utils::str(x) }))
Loading required package: matrixStats
Tracing function "colMeans2" in package "matrixStats"
[1] "colMeans2"
> pooledReadCounts <- poolRuns(readCounts,c("S47","S47"))
Tracing colMeans2(oldphenodata, cols = numericCols, useNames = FALSE) on entry
Input data to colMeans():
'data.frame':   2 obs. of  4 variables:
 $ name             : chr  "S47" "S47"
 $ total.reads      : num  828307 354404
 $ used.reads       : num  806129 345068
 $ expected.variance: num  0.0335 0.0783
Error in colMeans2(oldphenodata, cols = numericCols, useNames = FALSE) :
  Argument 'x' must be a matrix or a vector.
jppmatos commented 4 months ago

I'm closing, the data I was using was outdated, so possibly the issue was due to the data it self.