Closed zhuxqdoctor closed 4 years ago
Hi,
The error is caused by your snps
variable containing a genomic ranges coordinate with seqlevel ‘NC_007605’ that is not present in your bam.
Can you please try the getSnpMats10X() function instead if you have not done so already? I have previously encountered this issue and thought I had addressed it by filtering to relevant seqlevels in getSnpMats10X().
Also, please double check that your bam is aligned to the same genome version as your snps. Note the built-in ExAC snps are from hg19.
Best, Jean
Hi, Jean, Thanks for your reply. Based on your suggestion, I tried to use getSnpMats10X() command line, unfortunately also failed and got the same errors. I am sure my bam file were aligned from hg19 genome version which means it can be matched with the snps from ExAC just in the honeyBADGER package. So, do you have any other suggestion for me? Thanks.
Hum, in that case, please try the following:
# load ExAC chromosome 1 snps
load(system.file("ExAC", "ExAC_chr1.RData", package = "HoneyBADGER"))
head(snps)
seqlevels(snps)
# filter for only those with seqlevels in canonical chromosomes
chrs <- c(as.character(1:22), "X", "Y")
seqlevels(snps) <- chrs
head(snps)
seqlevels(snps)
Now you should be able to do
results <- getSnpMats10X(snps, bamFiles, indexFiles, barcodes)
If it works, I can update the getSnpMats10X() to do this by default.
Hi, Jean, Thanks again. This time I got a different error like this in the pileup step: [1] "Getting pileup..." Error in validObject(.Object) : invalid class “ScanBamParam” object: 'tagFilter' must contain only non-NULL, non-NA, non-empty character or integer values. Does it sound like the bam file is lack of something? Or should I filter or modify my bam file? Thanks.
Did you use CellRanger to get your bam? Do you know which version of CellRanger was used? It looks like your reads may be missing the 'CB' tag, which is used to assign each read to a cell barcode. Or perhaps CellRanger updated their tag name.
For example, here is a read from a 10X bam. Note the CB
tag.
> samtools view aml027_post_transplant_possorted_genome_bam.bam | less
NB500915:165:HYLMFBGXX:3:23402:11145:13415 16 1 10544 3 68M30S * 0 0 AAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCCCCATGTACTCTGCGTTGATACCACTGCTT <EE/EEE6EEE6EEAE6EEA/AEE<EEE/EAEEEEE<EEAAEEAEEAAEAEEE/EEEEEEEEEEEEEEEEEEE6EAEEEEEAE/EEEEEEEEEAAAA/ NH:i:2 HI:i:1 AS:i:64 nM:i:1 NM:i:1 CR:Z:TAAAGACTTCTAGG CQ:Z:AAAAA6EEEEEEEE CB:Z:TAAAGACTTCTAGG-2 UR:Z:TCCTGTCGTT UQ:Z:AAAAAEEE/A UB:Z:TCCTGTCGTT BC:Z:CTATACGC QT:Z:EEEAAAAA
Hi, Jean, Thanks for your reply. Yes, the bam file was derived from CellRanger (version, 2.0.1). Also I checked the bam file using samtools and the CB tag was in it. But it seemed a little different from your example. Here are the details. So is this the reason caused? C00135:251:CB8E5ANXX:8:1308:1504:73720 16 1 10551 255 70M31S * 0 0 TGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCCCCCATGTACTCTGCGTTGATACCACTGCTT /BF/B7//F<B77/77/7/FB/<77//F7/F7/</<77/F<FFFFB///7FFF7B/<FFFFFBB<BF7/<FBFFF<FFFFBFFFFB/FFFFFFFF</B</B NH:i:1 HI:i:1 AS:i:66 nM:i:1 RE:A:I CR:Z:CTTTGCGCAATGTTGC CY:Z:B/<BB/<FBBFFFF</ CB:Z:CTTTGCGCAATGTTGC-1 UR:Z:TTGAATCTCA UY:Z:FBB<FFFBF/ UB:Z:TTGAATCTCA BC:Z:AACGTCAA QT:Z:BBBBBFFF TR:Z:TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAATTTTTTTTAAAAAATTTTTTTTAAAAATAAATTATTTT TQ:Z:BFBFFFFFFFFFFFFFFFFFFFFFFFFB7B//B7B///BBB/<<7//77<</<BFB/<///////////////// RG:Z:SYL1_hg19:MissingLibrary:1:CB8E5ANXX:8
在 2018年12月3日,上午7:37,Jean Fan notifications@github.com 写道:
Did you use CellRanger to get your bam? Do you know which version of CellRanger was used? It looks like your reads may be missing the 'CB' tag, which is used to assign each read to a cell barcode. Or perhaps CellRanger updated their tag name.
For example, here is a read from a 10X bam. Note the CB tag.
samtools view aml027_post_transplant_possorted_genome_bam.bam | less NB500915:165:HYLMFBGXX:3:23402:11145:13415 16 1 10544 3 68M30S * 0 0 AAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCCCCATGTACTCTGCGTTGATACCACTGCTT <EE/EEE6EEE6EEAE6EEA/AEE<EEE/EAEEEEE<EEAAEEAEEAAEAEEE/EEEEEEEEEEEEEEEEEEE6EAEEEEEAE/EEEEEEEEEAAAA/ NH:i:2 HI:i:1 AS:i:64 nM:i:1 NM:i:1 CR:Z:TAAAGACTTCTAGG CQ:Z:AAAAA6EEEEEEEE CB:Z:TAAAGACTTCTAGG-2 UR:Z:TCCTGTCGTT UQ:Z:AAAAAEEE/A UB:Z:TCCTGTCGTT BC:Z:CTATACGC QT:Z:EEEAAAAA — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JEFworks/HoneyBADGER/issues/17#issuecomment-443553186, or mute the thread https://github.com/notifications/unsubscribe-auth/AqcSaEcusojqyLzILZSh1a5aE75Ftzwhks5u1GREgaJpZM4Yazsi.
As long as the CB
tag is present in all reads, it should be fine. To rule out the possibility of this being an issue with R/RSamtools, can you please try the python version for getting the allele counts information: https://github.com/barkasn/scAlleleCount
Hi, Jean, Thanks for your reply. I will try it and see if it works. Really appreciated for this.
Jean Fan notifications@github.com于2019年1月31日 周四上午3:27写道:
As long as the CB tag is present in all reads, it should be fine. To rule out the possibility of this being an issue with R/RSamtools, can you please try the python version for getting the allele counts information: https://github.com/barkasn/scAlleleCount
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JEFworks/HoneyBADGER/issues/17#issuecomment-459076395, or mute the thread https://github.com/notifications/unsubscribe-auth/AqcSaJMYs6uFoW9X3vsnXYPMmRWwHTiNks5vIfIzgaJpZM4Yazsi .
Hum, in that case, please try the following:
# load ExAC chromosome 1 snps load(system.file("ExAC", "ExAC_chr1.RData", package = "HoneyBADGER")) head(snps) seqlevels(snps) # filter for only those with seqlevels in canonical chromosomes chrs <- c(as.character(1:22), "X", "Y") seqlevels(snps) <- chrs head(snps) seqlevels(snps)
Now you should be able to do
results <- getSnpMats10X(snps, bamFiles, indexFiles, barcodes)
If it works, I can update the getSnpMats10X() to do this by default.
Following up on this part of the discussion, I was having a similar problem where the snps
object contained genomic ranges corresponding to GL unlocalized sequences. Filtering on canonical seqlevels as suggested above solved the issue for me.
Hello, Jean it's appreciated working to analysis CNV in single cell transcriptome level. and recently, I used getSnpMats10X() to import our 10x data, and I found some problems as following:
that's the code of getSnpMats10X(): getSnpMats10X <- function (snps, bamFiles, indexFiles, barcodes, n.cores = 1, verbose = FALSE) { cov <- getCoverage(snps, bamFile, indexFile, verbose) if (verbose) { print("Snps with coverage:") print(table(cov > 0)) } vi <- cov > 0 if (sum(vi) == 0) { print("ERROR: NO SNPS WITH COVERAGE. Check if snps and bams are using the same alignment reference.") return(NULL) } ......
And have you noticed that parameters are not paired in following function getSnpMats10X <- function (snps, bamFiles, indexFiles, barcodes, n.cores = 1, verbose = FALSE) {....}
and following function like getCoverage() used bamFile and indexFile without 's' cov <- getCoverage(snps, bamFile, indexFile, verbose)
So, I just fixed it up by remove 's' in code of getSnpMats10X. Hope that's right for following analysis.
Hello @JEFworks!
Thanks for this amazing tools. I have been trying to use it in my own data (scRNA 10x V3 - CellRanger 3.0.2) but have had some issues. In the beginning I use the function getCellAlleleCount
and had the same issue posted by @zhuxqdoctor (also using the snps matrix from the HoneyBadger). Then after reading this thread I have followed your recommendations to read in 10x data. However, once I run the command getSnpMats10X
, I got this error:
Error in getCoverage(snps, bamFile, indexFile, verbose): object 'bamFile' not found
Traceback:
1. getSnpMats10X(snps, bamFiles, indexFiles, barcodes, n.cores = 12)
2. getCoverage(snps, bamFile, indexFile, verbose)
3. pileup(file = bamFile, index = indexFile, scanBamParam = ScanBamParam(which = gr),
. pileupParam = pp)
I really do not understand why it is happening because I am point out to the correct path to the file. Then, I checked the bam file generated by CellRanger and noticed that not all reads have the CB
tag. Do you think this can be the source of the problem? Should I filter the bam file manually?
Thanks in advance for your help!
Hi @pangxueyu233 , Thanks so much for the catch. The bug has been fixed in commit 6c77f05
@ccruizm Please follow @pangxueyu233 's fix for the bug or copy the updated function with the correct parameter names:
getSnpMats10X <- function(snps, bamFile, indexFile, barcodes, n.cores=1, verbose=FALSE) {
## loop
cov <- getCoverage(snps, bamFile, indexFile, verbose)
## any coverage?
if(verbose) {
print("Snps with coverage:")
print(table(cov>0))
}
vi <- cov>0
if(sum(vi)==0) {
print('ERROR: NO SNPS WITH COVERAGE. Check if snps and bams are using the same alignment reference.')
return(NULL)
}
cov <- cov[vi]
snps.cov <- snps[vi,]
if(verbose) {
print("Getting allele counts...")
}
alleleCount <- getCellAlleleCount(snps.cov, bamFile, indexFile, barcodes, verbose=TRUE, n.cores=n.cores)
refCount <- alleleCount[[1]]
altCount <- alleleCount[[2]]
## check correspondence
if(verbose) {
print("altCount + refCount == cov:")
print(table(altCount + refCount == cov))
print("altCount + refCount < cov: sequencing errors")
print(table(altCount + refCount < cov))
##vi <- which(altCount + refCount != cov, arr.ind=TRUE)
## some sequencing errors evident
##altCount[vi]
##refCount[vi]
##cov[vi]
}
results <- list(refCount=refCount, altCount=altCount, cov=cov)
return(results)
}
As long as the
CB
tag is present in all reads, it should be fine. To rule out the possibility of this being an issue with R/RSamtools, can you please try the python version for getting the allele counts information: https://github.com/barkasn/scAlleleCount
I had the same issue as reported by @helianthuszhu. I am using Cellranger 3.0 and confirmed that the CB tags are present in my BAM files. I identified the problem to be within the getCellAlleleCount
function. The tf
variable must be a character vector within a named list. Instead, it is a factor within a named list. So to solve the problem, change tf <- list(cell)
to tf <- list(as.character(cell))
within the getCellAlleleCount
function.
Hi, I am trying to use HonryBADGER to call CNV from 10X Genomics RNAseq data. Initially, I used the snps matrix from the HoneyBadger package and loaded my own bam, bai and barcode file to run