JEFworks-Lab / HoneyBADGER

HMM-integrated Bayesian approach for detecting CNV and LOH events from single-cell RNA-seq data
http://jef.works/HoneyBADGER/
GNU General Public License v3.0
96 stars 31 forks source link

error in summarizeResults #13

Closed csimona closed 5 years ago

csimona commented 5 years ago

Thanks for developing and maintaining HoneyBADGER! I get the following error when running this command (after preprocessing the data, analyzing it, and including the SNP info):

results <- hb$summarizeResults(geneBased=FALSE, alleleBased=TRUE) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 15, 6

the algorithm identified 15 alterations, as length(hb$cnvs$allele-based$all) is 15, but then further only found 6 as deletions with LOH, as length(hb$cnvs$allele-based$del.loh) is 6.

Thanks!

JEFworks commented 5 years ago

Hi Simona,

Thanks for your patience.

It looks like the error is stemming from this code in hb$summarizeResults:

rgs <- cnvs[["allele-based"]][["all"]]
retest <- results[["allele-based"]]
del.loh.allele.prob <- do.call(rbind, lapply(retest, function(x) x))
vi1 <- rowSums(del.loh.allele.prob > 0.75) > min.num.cells
del.loh.allele.prob <- del.loh.allele.prob[vi1, ]
names <- apply(as.data.frame(rgs), 1, paste0, collapse = ":")
rownames(del.loh.allele.prob) <- paste0("del.loh.", names[vi1])
cnvs[["allele-based"]][["del.loh"]] <<- rgs[vi1]
summary[["allele-based"]] <<- del.loh.allele.prob
colnames(del.loh.allele.prob) <- paste0("del.loh.allele.", colnames(del.loh.allele.prob))
df <- cbind(as.data.frame(rgs), avg.del.loh.allele = rowMeans(del.loh.allele.prob), del.loh.allele.prob)

because, as you aptly noted, rgs <- cnvs[["allele-based"]][["all"]] has 15 alterations identified. But del.loh.allele.prob has been filtered to only the alternations affecting more than min.num.cells with greater than 75% posterior probability. In hindsight this should probably also be modified to take a parameter to allow users to have greater stringency on the posterior probability filter.

The fastest hack-y "fix" I believe is to just set min.num.cells = 0 instead of the default = 2.

The correction would be to have:

df <- cbind(as.data.frame(rgs[vi1]), avg.del.loh.allele = rowMeans(del.loh.allele.prob), del.loh.allele.prob)

Can you please double check that the following code works for you?

rgs <- hb$cnvs[["allele-based"]][["all"]]
retest <- hb$results[["allele-based"]]
del.loh.allele.prob <- do.call(rbind, lapply(retest, function(x) x))
min.num.cells <- 2
vi1 <- rowSums(del.loh.allele.prob > 0.75) > min.num.cells
del.loh.allele.prob <- del.loh.allele.prob[vi1, ]
names <- apply(as.data.frame(rgs), 1, paste0, collapse = ":")
rownames(del.loh.allele.prob) <- paste0("del.loh.", names[vi1])
colnames(del.loh.allele.prob) <- paste0("del.loh.allele.", colnames(del.loh.allele.prob))
df <- cbind(as.data.frame(rgs[vi1]), avg.del.loh.allele = rowMeans(del.loh.allele.prob), del.loh.allele.prob)
print(df)

If it works, I can make the appropriate corrections to the repo and acknowledge you in the commit message.

Thanks, Jean

csimona commented 5 years ago

Hi Jean,

Thanks for the reply and for the fix; the code works for me.