JEFworks-Lab / HoneyBADGER

HMM-integrated Bayesian approach for detecting CNV and LOH events from single-cell RNA-seq data
http://jef.works/HoneyBADGER/
GNU General Public License v3.0
96 stars 31 forks source link

error in retestIdentifiedCnvs #20

Open akdess opened 5 years ago

akdess commented 5 years ago

Dear Jean,

When I run the following code hb$retestIdentifiedCnvs(retestBoundGenes=TRUE, retestBoundSnps=TRUE, verbose=T), i get the following error: Error in as.vector(x) : no method for coercing this S4 class to a vector

sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] gridExtra_2.3 reshape2_1.4.3
[3] ggplot2_3.0.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [5] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0
[7] rjags_4-6 coda_0.19-1
[9] biomaRt_2.34.2 edgeR_3.20.9
[11] limma_3.34.9 HoneyBADGER_0.1
[13] SummarizedExperiment_1.8.1 DelayedArray_0.4.1
[15] matrixStats_0.52.2 Biobase_2.38.0
[17] GenomicRanges_1.30.3 GenomeInfoDb_1.14.0
[19] IRanges_2.12.0 S4Vectors_0.16.0
[21] BiocGenerics_0.24.0

loaded via a namespace (and not attached): [1] Rcpp_0.12.19 locfit_1.5-9.1 lattice_0.20-35 Rsamtools_1.30.0
[5] Biostrings_2.46.0 prettyunits_1.0.2 assertthat_0.2.0 digest_0.6.15
[9] R6_2.2.2 HiddenMarkov_1.8-11 plyr_1.8.4 RSQLite_2.0
[13] httr_1.3.1 pillar_1.1.0 zlibbioc_1.24.0 rlang_0.2.0
[17] progress_1.1.2 lazyeval_0.2.1 curl_3.0 blob_1.1.0
[21] Matrix_1.2-12 BiocParallel_1.12.0 RMySQL_0.10.13 stringr_1.3.1
[25] RCurl_1.95-4.8 bit_1.1-12 munsell_0.5.0 rtracklayer_1.38.0
[29] compiler_3.4.3 pkgconfig_2.0.1 tidyselect_0.2.3 tibble_1.4.2
[33] GenomeInfoDbData_0.99.1 XML_3.98-1.11 withr_2.1.2 dplyr_0.7.5
[37] GenomicAlignments_1.14.1 bitops_1.0-6 grid_3.4.3 gtable_0.2.0
[41] DBI_0.7 magrittr_1.5 scales_1.0.0 stringi_1.1.7
[45] XVector_0.18.0 bindrcpp_0.2.2 RColorBrewer_1.1-2 tools_3.4.3
[49] bit64_0.9-7 glue_1.2.0 purrr_0.2.4 colorspace_1.3-2
[53] caTools_1.17.1 memoise_1.1.0 bindr_0.1.1

JEFworks commented 5 years ago

Hi Akdes,

The only use of as.vector() is in coercing a table of either snps or genes frequencies into a vector. I've seen similar errors thrown with as.data.frame() before when GenomicRanges wasn't loaded properly, but your session info looks fine on that front.

Can you double check that the bound.genes.final and bound.snps.final slots in your hb object are not empty? These would be the candidate CNVs that HoneyBADGER identified from the allele or gene expression models alone and is trying to go back and reassess again using the combined model. Since both retestBoundGenes and retestBoundSnps are set to TRUE the following code should execute:

if(retestBoundSnps & retestBoundGenes) {
            if(verbose) {
                cat('Retesting bound snps and genes using joint model')
            }
            if(length(bound.genes.final)==0) {
                cat('ERROR NO GENES AFFECTED BY CNVS IDENTIFIED! Run calcGexpCnvBoundaries()? ')
                return();
            }
            if(length(bound.snps.final)==0) {
                cat('ERROR NO SNPS AFFECTED BY CNVS IDENTIFIED! Run calcAlleleCnvBoundaries()? ')
                return();
            }

            gr <- range(genes[unlist(bound.genes.final),])
            sr <- range(snps[unlist(bound.snps.final),])
            if(intersect) {
                rgs <- intersect(gr, sr)
            } else {
                rgs <- union(gr, sr)
            }

            retest <- lapply(seq_len(length(rgs)), function(i) {
                x <- calcCombCnvProb(region=rgs[i], m=dev, verbose=verbose, ...)
                list(x[[1]], x[[2]])
            })
            cnvs[['combine-based']] <<- list()
            cnvs[['combine-based']][['all']] <<- rgs
            results[['combine-based']] <<- retest
        }

It should catch if either of those slots are empty, but maybe you have a unique case that isn't being caught properly.

erin-d commented 5 years ago

Hi @JEFworks, I too am having issues with the command in the tutorial returning an error.

hb$retestIdentifiedCnvs(retestBoundGenes=TRUE, retestBoundSnps=TRUE, verbose=FALSE)

Error in as.vector(x) : no method for coercing this S4 class to a vector

I have the genomic ranges library, and ran the test code you suggested above4, but I just got another error. Error: object 'retestBoundSnps' not found

I also made sure to check that the bound.genes.final and bound.snps.final slots in your hb object are not empty.

Hb class

Do you have any other suggestions or things I should check? Thanks!

JEFworks commented 5 years ago

Hi Erin,

Thanks for the excellent debugging.

retestBoundSnps is a boolean parameter that is passed in the built-in function and by default set to TRUE, along with retestBoundGenes. You can just create a new object called retestBoundSnps=TRUE to fix the not found error. The purpose for both of these parameters is to distinguish between retesting CNVs identified by the allele and expression-based models for comparison purposes.

One possibility is that some of your identified CNVs are too small, on the order of 1 SNP or 1 gene. These would in theory get removed upon retesting. But I think there is an R-related bug with coercing the length 1 list into a 1 row matrix rather than a vector. For example:

> foo = matrix(1:6, 2,3)
> foo
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> foo = matrix(1:9,3,3) 
> foo
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> vi = c(TRUE, TRUE, FALSE) 
> class(foo[vi,]) ## note still matrix
[1] "matrix"
> vi = c(TRUE, FALSE, FALSE) ## since only 1 passed, now not matrix
> class(foo[vi,])
[1] "integer"

Can you please check how large are your identified CNVs? Likewise, how many genes and how many SNPs did you begin with; it is possible that the default filtering is too stringent and you began with very few genes or SNPs for example.

erin-d commented 5 years ago

Hi Jean,

I tried your test case code again, and after

retest <- lapply(seq_len(length(rgs)), function(i) { x <- calcCombCnvProb(region=rgs[i], m=dev, verbose=verbose, ...) list(x[[1]], x[[2]])

The following error resulted

Error in FUN(X[[i]], ...) : '...' used in an incorrect context

Also, I began with 9 genes and 4 snps Regions genes region snps

I think you may be right about it having to do with coercing the length 1 list into a 1 row matrix rather than a vector.

Thank You! Erin