JEFworks-Lab / HoneyBADGER

HMM-integrated Bayesian approach for detecting CNV and LOH events from single-cell RNA-seq data
http://jef.works/HoneyBADGER/
GNU General Public License v3.0
95 stars 31 forks source link

Error: subscript contains invalid names #34

Open suyanxun opened 4 years ago

suyanxun commented 4 years ago

image

What's wrong when I run the calcGexpCnvBoundaries using example data like https://jef.works/HoneyBADGER/Getting_Started.html?

JEFworks commented 4 years ago

Hi,

Please check that the biomart object you are using is indeed properly loaded. You can double check this by accessing hb$genes in the tutorial.

Please check that the gene names consistent with the gene names used in your expression matrix.

Best, Jean

kylandra commented 4 years ago

Dear Jean, I had the same problem running the tutorial, as suyanxun reported.

> hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE)
Loading required package: rjags
Loading required package: coda
Linked to JAGS 4.3.0
Loaded modules: basemod,bugs
ERROR: Error: subscript contains invalid names
NULL

As you suggested, I checked the hb$genes slot and the gene names in the matrix of the example and everything seems fine.

> print(gexp[1:5,1:5])
      MGH31_A02  MGH31_A03 MGH31_A04  MGH31_A05  MGH31_A06
AAAS   3.897660 -4.5248935 -4.183228  2.7093435  3.5550152
AADAT  2.166325  1.5769039  3.300534  2.1134482  0.8006273
AAGAB -4.193448  0.4587702 -4.232739 -2.6151893 -2.4645792
AAK1   2.360924  2.6094029  2.830466  2.1001597  2.1811283
AAMP   3.484948  2.2440590  3.007798  0.4639664  1.8083660

> head(hb$genes)
GRanges object with 6 ranges and 0 metadata columns:
        seqnames              ranges strand
           <Rle>           <IRanges>  <Rle>
   AAAS    chr12   53307456-53324864      *
  AADAT     chr4 170060222-170091699      *
  AAGAB    chr15   67200667-67255195      *
   AAK1     chr2   69457997-69674349      *
   AAMP     chr2 218264123-218270257      *
   AAR2    chr20   36236459-36270918      *
  -------
  seqinfo: 69 sequences from an unspecified genome; no seqlengths

how can we solve this issue in the tutorial? I was just checking each step of the pipeline before perform the analysis on my own datatset.

Thanks

JEFworks commented 4 years ago

Hi,

Thanks for the bug report. This may have to do with biomart or a related dependency being updated.

To help me with the debugging, could you please let me know if the previous steps in the tutorial:

hb <- new('HoneyBADGER', name='MGH31')
hb$setGexpMats(gexp, ref, mart.obj, filter=FALSE, scale=FALSE, verbose=TRUE)
hb$plotGexpProfile() ## initial visualization
hb$setMvFit(verbose=TRUE) ## model variance
hb$setGexpDev(verbose=TRUE) ## model necessary expression deviation to identify CNVs

All worked properly? (e.g. without error or warning)

Also, after running calcGexpCnvBoundaries, is there any output provided by:

## double check what CNVs were identified
bgf <- hb$bound.genes.final
genes <- hb$genes
regions.genes <- range(genes[unlist(bgf)])
print(regions.genes)

Or is the output null?

Thanks for the help, Jean

On Jan 10, 2020, at 8:21 AM, kylandra notifications@github.com<mailto:notifications@github.com> wrote:

Dear Jean, I had the same problem running the tutorial, as suyanxun reported.

hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE) Loading required package: rjags Loading required package: coda Linked to JAGS 4.3.0 Loaded modules: basemod,bugs ERROR: Error: subscript contains invalid names NULL

As you suggested, I checked the hb$genes slot and the gene names in the matrix of the example and everything seems fine.

print(gexp[1:5,1:5]) MGH31_A02 MGH31_A03 MGH31_A04 MGH31_A05 MGH31_A06 AAAS 3.897660 -4.5248935 -4.183228 2.7093435 3.5550152 AADAT 2.166325 1.5769039 3.300534 2.1134482 0.8006273 AAGAB -4.193448 0.4587702 -4.232739 -2.6151893 -2.4645792 AAK1 2.360924 2.6094029 2.830466 2.1001597 2.1811283 AAMP 3.484948 2.2440590 3.007798 0.4639664 1.8083660

head(hb$genes) GRanges object with 6 ranges and 0 metadata columns: seqnames ranges strand

AAAS chr12 53307456-53324864 * AADAT chr4 170060222-170091699 * AAGAB chr15 67200667-67255195 * AAK1 chr2 69457997-69674349 * AAMP chr2 218264123-218270257 * AAR2 chr20 36236459-36270918 * ------- seqinfo: 69 sequences from an unspecified genome; no seqlengths

how can we solve this issue in the tutorial? I was just checking each step of the pipeline before perform the analysis on my own datatset.

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JEFworks_HoneyBADGER_issues_34-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DACEPMSBJ3K4GC4SASV3U3ZTQ5BY5RA5CNFSM4IZCG34KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIT4R6Q-23issuecomment-2D573032698&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=sacPl6ehGsUf0W5tx7MFmwWS2ZNsqqYsq0zGdb6UUzk&s=SMKV3AkrFdkOYuy4gsVmOSPCQLQbB81Gjhwj0prZ7yU&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACEPMSB7VFCPCQSTSHFMKG3Q5BY5RANCNFSM4IZCG34A&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=sacPl6ehGsUf0W5tx7MFmwWS2ZNsqqYsq0zGdb6UUzk&s=bU5ielDim2m8jcN1zBjn8YTmpQ_2bXB4zBiC8GwE7xU&e=.

kylandra commented 4 years ago

Hi Jean,

the version of biomart I'm using is biomaRt_2.42.0, if this can help you.

The first steps worked with no problem:

> hb <- new('HoneyBADGER', name='MGH31')
> hb$setGexpMats(gexp, ref, mart.obj, filter=FALSE, scale=FALSE, verbose=TRUE)
Initializing expression matrices ... 
Normalizing gene expression for 6082 genes and 75 cells ... 
Cache found
Done setting initial expression matrices! 
> hb$plotGexpProfile()
> hb$setMvFit(verbose=TRUE)
Modeling expected variance ... Done!
> hb$setGexpDev(verbose=TRUE)
Optimal deviance: 1.015797

Then I get the error:

hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE)
Loading required package: rjags
Loading required package: coda
Linked to JAGS 4.3.0
Loaded modules: basemod,bugs
ERROR: Error: subscript contains invalid names
NULL

But if I double check what CNVs were identified, I obtain this result:

> bgf <- hb$bound.genes.final
> genes <- hb$genes
> regions.genes <- range(genes[unlist(bgf)])
> print(regions.genes)
GRanges object with 5 ranges and 0 metadata columns:
      seqnames              ranges strand
         <Rle>           <IRanges>  <Rle>
  [1]    chr11      167784-2397419      *
  [2]     chr7    816615-158956747      *
  [3]     chr9 137241287-138124624      *
  [4]    chr10    274190-133373354      *
  [5]     chr5  36876769-179634784      *
  -------
  seqinfo: 69 sequences from an unspecified genome; no seqlengths

So is it working?

I continued with the tutorial, and I got this error:

> hb$retestIdentifiedCnvs(retestBoundGenes = TRUE, retestBoundSnps = FALSE, verbose=FALSE)
> results <- hb$summarizeResults(geneBased=TRUE, alleleBased=FALSE)
> print(head(results[,1:5]))
> trees <- hb$visualizeResults(geneBased=TRUE, alleleBased=FALSE, details=TRUE, margins=c(25,15))
> hc <- trees$hc
> order <- hc$labels[hc$order]
> hb$plotGexpProfile(cellOrder=order)
Error in d[, cellOrder] : index out of bounds

and checking order vector I obtain:

> head(order)
[1] "del.gexp.MGH31_E09" "del.gexp.MGH31_A08" "del.gexp.MGH31_H07" "del.gexp.MGH31_G04"
[5] "del.gexp.MGH31_F10" "del.gexp.MGH31_F01"

what is going on?

thanks Oriana

aureliebousard commented 4 years ago

Dear Jean,

I have exactly the same problem. Did you finally solve this issue?

Thanks

Aurélie

garci1294 commented 4 years ago

@kylandra @aureliebousard @JEFworks

Hi everyone!

In regards to the index out of bounds error, I tried the following: Error in d[, cellOrder] : index out of bounds

I'm thinking the cellOrder is looking for integers, not strings. So I removed hc$labels. order <- hc$labels[hc$order] changed to order = hc$order

The following code gave me the correct graph similar to the getting started tutorial.

> hb$setMvFit(verbose=TRUE)
> hb$setGexpDev(verbose=TRUE)
> hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE)
> order <- hc$order
> hb$plotGexpProfile(cellOrder=hc$order)

I hope this helps. This allowed me to finish the getting started tutorial.

Jesus

tinyi commented 4 years ago

I have the same issue. If continue, it halts at :

results <- hb$summarizeResults(geneBased=TRUE, alleleBased=FALSE) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 5, 4

newihi001 commented 4 years ago

@tinyi I had the same error, I found that the error of differing number of rows: 5, 4 is caused by mismatching row number of rgs and amp.gexp.prob. May need to get the same rows to bind them. Good luck!

Qiu

rpiskol commented 3 years ago

@suyanxun @aureliebousard @kylandra - you may have to specify a different host for the biomart. The following is taken from the setGexpMats help and works fine.

mart.obj <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = 'hsapiens_gene_ensembl', 
host = "jul2015.archive.ensembl.org")

R