alexdobin / STAR

RNA-seq aligner
MIT License
1.86k stars 506 forks source link

Splice junction not detected in RNAseq data #1877

Open ZChristophe5 opened 1 year ago

ZChristophe5 commented 1 year ago

Hi, I have noticed some issues with the alignment when I did variant calling on my RNA-seq data (Illumina PE 2x100bp). STAR often promotes mismatched overhanging mappings onto introns instead of calling splicing junctions, and these mismatched bases will remain during the variant calling and lead to false positive mutations. I have attached two cases illustrating this issue. Theses junctions are annotated in the GTF used to generate the genome.

This is my parameters with STAR 2.7.10b : STAR --runMode alignReads \ --runThreadN 12\ --genomeDir $index \ --twopassMode Basic \ --outFilterMultimapNmax 20 \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.1 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --outFilterType BySJout \ --outFilterScoreMinOverLread 0.33 \ --outFilterMatchNminOverLread 0.33 \ --limitSjdbInsertNsj 1200000 \ --readFilesIn $R1 $R2 \ --readFilesCommand zcat \ --outFileNamePrefix $output \ --outSAMstrandField intronMotif \ --outFilterIntronMotifs None \ --alignSoftClipAtReferenceEnds Yes \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --genomeLoad NoSharedMemory \ --chimSegmentMin 15 \ --chimJunctionOverhangMin 15 \ --chimOutType Junctions WithinBAM SoftClip \ --chimMainSegmentMultNmax 1 \ --outSAMattributes NH HI AS nM NM ch \ --outSAMattrRGline ID:rg1 SM:sm1

UNC45B

I have managed to fix this example (figure above) by adding this option --outSJfilterDistToOtherSJmin 0 0 0 0. But this doesn't solve the issue completely. I have found another kind of alignment error like this (see figure below).

HNRNPM

I have tested others options like : --outSJfilterOverhangMin 12 12 12 12 \ --outSJfilterCountUniqueMin 1 1 1 1 \ --outSJfilterCountTotalMin 1 1 1 1 \ --outSJfilterDistToOtherSJmin 0 0 0 0 \ --scoreGapNoncan 0\ --scoreGapGCAG 0 \ --scoreGapATAC 0

Thank you in advance for your help.

alexdobin commented 1 year ago

Hi @ZChristophe5

if the junctions are annotated, and there are no mismatches in the reads close to the junctions, the splicing should be detected in most cases. If you zoom out, do you see any other mismatches in the problematic reads?

ZChristophe5 commented 1 year ago

Hi @alexdobin

I only see mismatches that correspond to bases that should be spliced (those in my previous figure).

HNRNPM1

Maybe this one is just a rare case of alignment error, I can't find other errors like this (also because I don't have the means to control this).

alexdobin commented 1 year ago

In the latter case one of the exons involved is very short, 7 bases - this may cause mappability troubles.

y9c commented 1 year ago

Hi @alexdobin. I find a similar problem when dealing with "mismatches in the reads close to the junctions". There the splicing site within these reads cannot be detected and lead to false positves of mutation in the intron. Is there a solution to avoid this?

alexdobin commented 1 year ago

Hi @y9c

presently there is no way to avoid it while mapping, you would need to filter such reads after mapping.