VCCRI / Sierra

Discover differential transcript usage from polyA-captured single cell RNA-seq data
GNU General Public License v3.0
49 stars 17 forks source link

FindPeaks Error--'x' values larger than vector length 'sum(width)' #55

Open Jun2BCR opened 2 years ago

Jun2BCR commented 2 years ago

Hi there, Thank you very much for developing this great tool. I use FindPeaks to analyze one 20Gb Bam file from a single cell dataset, after about 1.5 hours of running, I got an error saying "Error in { : task 5103 failed - "'x' values larger than vector length 'sum(width)'" In addition: Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. " This is the whole code: FindPeaks(output.file = peak.output.file[1], # output filename gtf.file = reference.file, # gene model as a GTF file bamfile = bamfile[1], # BAM alignment filename. junctions.file = "Saline_R1_junctions.bed", # BED filename of splice junctions exising in BAM file. ncores = 6)

Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK 55357 gene entries to process Loading required package: stats4 Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:stats':

IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

expand.grid, I, unname

Loading required package: IRanges Loading required package: GenomeInfoDb

Error in { : task 5103 failed - "'x' values larger than vector length 'sum(width)'" In addition: Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

Thank you very much for your help. Best, Jun

rj-patrick commented 2 years ago

Hi @Jun2BCR,

It's not obvious what's going on and without being able to replicate the issue it's tricky to resolve. However, given that FindPeaks ran for that long before the error I'd say there is an unusual situation coming up with one of the genes or chromosomes in your data. Can I check are you working with human or mouse or a different species? One way to narrow down the issue would be to selectively run FindPeaks on different chromosomes. You can set filter.chr = TRUE and use the chr.names parameter to set a subset of chromosomes (or just one) to test and try different chromosomes until you find the one which is throwing the error. Hopefully this will help identity where the issue is at least.

Cheers, Ralph

Jun2BCR commented 2 years ago

Dear Ralph, Thank you very much for your prompt response. I appreciate it. I am working with both mouse and human genomes. I really love Sierra and would like to use Sierra for all of the ongoing scRNA-seq projects in my group. I will try to do what you suggest and let you know which chromosome(s) give(s) me a hard time later.

A small comment, it will be highly appreciated if you can tell people how big the bam files you used in the Sierra Vignette and how much time it takes in each step so that common users like me will have a better sense. But it is irrelevant:).

Thanks again. Best, Jun