alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

Request: provide alternate --quantMode : BinCounts #929

Open malcook opened 4 years ago

malcook commented 4 years ago

I use STAR happily and effectively for genome alignment workflows (e.g. CHiP/ATAC-Seq).

Some approaches to downstream analysis required comparing read counts in genomic "bins" (e.g. one bin every thousand bases).

If STAR were to implement a new --quantMode of "BinCounts" which tallied reads overlapping each bin, it would enhance STAR's utility in such pipelines.

Of course you would need to provide additional option to define the bins (possibly a bedfile, or possibly simply a binWidth paramater, defaulting to 1000). And you might then want to provide further options to extend reads (pairs) and/or shift them (such as provided by featureCounts and/or igvtools count) for purposes of such quantification.

Were you to do so, you'd probably want to adopt semantics of ht-seq's "--nonunique all" option (https://htseq.readthedocs.io/en/release_0.11.1/count.html)

I read in #199 that: "I have it high on my TODO list to add the multi-mappers counting". This might well be tackled as conjoined effort.

Just a thought, and, cheers to you!

alexdobin commented 4 years ago

Hi Malcolm,

This is a good idea, thanks! At the moment, you can try to hack the bin counting by creating a GTF file with the bins: each bin is a separate exon with distinct transcription and gene_id. The --quantMode GeneCount will count uniquely mapping reads per bin. It will not count reads that map to two or more boundaries, those will be considered ambiguous.

In terms of counting non-unique mappers, it's still high on my TODO list :), thanks for bumping it up! I I hope to add more options for counting, and to synchronize them with STARsolo.

Cheers Alex