STAR aligner call / UMI collapsed temp data

sklages commented 7 years ago

Hi, this is more a information request than a software issue.

I'd like to use cellranger for a custom genome reference. I know that my data will not map uniquely on that genome, I expect multiple hits of many reads. The standard cellranger count workflow will discard such reads. That would leave me without data at the end ;-)

I could not find a way to alter the STAR alignment parameters, e.g. outFilterMultimapNmax. Can you provide some info where I find the parameter setting for STAR in the package?
Is there a way to get "intermediate" data? An alternative would be to use the umi-collapsed reads from cellranger and feed these into some standard aligner like bwa with subsequent "manual" analysis (I don't need spliced alignment).

I'd appreciate any hints ..

best, Sven

pryvkin10x commented 7 years ago

Hi Sven,

There could be a few things going on: STAR calls as unmapped those reads w/ >10 genomic alignments by default, so if you expect >10 alignments to be typical, you'll need to change outFilterMultimapNmax by adding the argument to the STAR call here:

https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/reference.py#L527

Once you've done that, or if the alignment number is typically <10, CR will still not consider these reads for UMI counting but it will report them in its final BAM file.

If you want to realign the data yourself (while preserving cell barcodes and UMIs) you have a few options, but unfortunately none of them are easy.

Convert the final BAM file back to FASTQ. However, most tools don't support preservation of the UMI/barcode tags. This can be done by writing a python script that uses pysam.
Kill the pipeline after EXTRACT_READS finishes but before its children finish. Nested deep under the EXTRACT_READS directory you'll find a set of FASTQ files that contain the barcode and UMI info.

kobeho24 commented 6 years ago

One naive question, how can I kill the pipeline automatically as you mentioned above? I cannot find any fastq file under EXTRACT_READS directory after complete cellranger run.

10XGenomics / cellranger

STAR aligner call / UMI collapsed temp data #2