JEFworks-Lab / HoneyBADGER

HMM-integrated Bayesian approach for detecting CNV and LOH events from single-cell RNA-seq data
http://jef.works/HoneyBADGER/
GNU General Public License v3.0
95 stars 31 forks source link

read bam files #35

Closed cq2019 closed 4 years ago

cq2019 commented 4 years ago

Hi I am trying to use HoneyBADGER to convert 10X genomic single cell RNA-seq data to CNV. to read bam file, I did:

files <- list.files(path = '/data/CellRanger/sample1/outs’) bamFiles <- files[grepl('possorted_genome_bam.bam', files)] bamFiles <- paste0(path, bamFiles)

Error in as.vector(x, "character") : cannot coerce type 'closure' to vector of type 'character'

I am wondering how to fix this

Thanks

JEFworks commented 4 years ago

Hi,

Thanks for your interest in HoneyBADGER.

Please double check that the path object exists and that the bamFiles list contains the appropriate paths with proper / between paths and file names.

path = '/data/CellRanger/sample1/outs/’
files <- list.files(path)
bamFiles <- files[grepl('possorted_genome_bam.bam', files)]
bamFiles <- paste0(path, bamFiles)

Best, Jean

cq2019 commented 4 years ago

Thanks it helps. however, I got another issue:

path = "/data/CellRanger/sample1/outs/" files <- list.files(path = path) bamFiles <- files[grepl('possorted_genome_bam.bam', files)] bamFiles <- paste0(path, bamFiles) indexFiles <- files[grepl('possorted_genome_bam.bam.bai', files)] indexFiles <- paste0(path, indexFiles) cellBarcodes <- readLines('barcodes.tsv') results <- getSnpMats10X(snps, bamFiles, indexFiles, cellBarcodes)

Error in pileup(file = bamFile, index = indexFile, scanBamParam = ScanBamParam(which = gr), : length(file) == 1L is not TRUE

Thanks cq

JEFworks commented 4 years ago

Hi,

getSnpMats10X assumes that bamFiles and indexFiles are single strings (length == 1) as noted in the error output. It looks like you have many runs of 10X and thus many bam and index files.

You will want to generate a different set of matrices for each bam file:

results1 <- getSnpMats10X(snps, bamFiles[1], indexFiles[1], barcodes)
results2 <- getSnpMats10X(snps, bamFiles[2], indexFiles[2], barcodes)
...

Also, note for each 10X run, the cell barcode names used will be the same but they will correspond to different cells from different runs. So later, when you combine across many 10X runs, you will need to be careful not to merge the same cell barcode from different runs (just a heads up). Ex: Combine coverage matrices from many runs

cov1 <- results1$cov
colnames(cov1) <- paste0('run1_', colnames(cov1))
cov2 <- results2$cov
colnames(cov2) <- paste0('run2_', colnames(cov2))

cov <- cbind(cov1, cov2)

Best, Jean

JEFworks commented 4 years ago

Hi QC,

You can parallelize getSnpMats10X using the n.cores parameter. See ?getSnpMats10X for more details.

However, it may be more beneficial and efficient to consult or collaborate with a local bioinformatician, as questions like how to determine the length of a file list and how to save objects are not necessarily related to the HoneyBADGER package; they are questions for R and programming in general that I am unavailable to address. As such I will be marking this issue as closed. If you have additional questions or run into bugs related to the HoneyBADGER package, please feel free to reopen.

Best, Jean