Uniquely mapping reads to identical regions

red-plant commented 5 years ago

Dear Dr. Dobin,

In quantifying allele-specific-expression, I am using STAR to map reads against a diploid genome (custom concatenated fasta files for each race and same for annotation). In the counting step, I noticed many of the reads uniquely aligned by STAR (~25%), did not overlap any variant. After checking in a genomic browser, I noticed that the variants for these transcripts were in introns, but the reads did not overlap the introns, only the exons around them. This particular subset of reads, not overlaping a variant but being uniquely aligned, did not agree with the simulations I am running, so I am puzzled as to why are they uniquely-mapping. I tried using no annotation (considering a sjdb bias towards the better-annotated race), without success. I would greatly appreciate any help solving this.

Thanks,

I ran STAR 2.7.0.d with the following flags: --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 0 --alignIntronMax 900 --alignMatesGapMax 900 --readFilesCommand pigz -d -c

red-plant commented 5 years ago

Worked-around using featureCounts for counting only reads aligned to variants. I suppose that STAR handles this well by using its WASP option. I preferred the counting pipeline, since in plants multi-mapping reads are scarce, and checking that both alleles are equally 'mappable' is not necessary (in my opinion). Anyway, thanks for keeping STAR so up-to-date with this new features.

alexdobin commented 5 years ago

Hi Jose,

sorry for a belayed reply, it's great that you found a workaround. Indeed, STAR does not assess variants for reads that map to multiple loci, which will be frequent when you map to a diploid genome.

Cheers Alex

alexdobin / STAR

Uniquely mapping reads to identical regions #582