alexdobin / STAR

RNA-seq aligner
MIT License
1.86k stars 506 forks source link

Splice site missed in SJ.out.tab #846

Open Vita-github opened 4 years ago

Vita-github commented 4 years ago

Hi! One junction is missing in the SJ.out.tab file, which spans from chr17 29483058 - 29486025. As you can see, the missed junction is well covered by junction reads. What could be the reason behind it? STAR-missed-SS

Best, Vita

alexdobin commented 4 years ago

Hi Vita,

STAR filters the output of the junctions in the SJ.out.tab file using the following filters. One of them must be filtering out this specific junction. Is this junction's motif non-canonical? The filters are harsher on non-canonical junctions.

Cheers Alex

Vita-github commented 4 years ago

Hi Alex,

Indeed, the junction is AG/AG and due to alignment error (shift of two basepairs) the read contains only non-canonical unannotated junction. The real junction is spanning from 29483061-29486027 (the junction is a consequence of a known mutation - confirmed by Sanger seq). The filters that I used were: --outFilterMultimapNmax 2 -‐outFilterMismatchNmax 20 --chimSegmentMin 0 Is there a way I can improve alignment/junction detection - or prevent STAR to filter out non-canonical motifs?

Best, Vita

alexdobin commented 4 years ago

Hi Vita,

if you are interested in this particular junction (or other junctions whose locations you know), you can add it as "annotated" in the --sjdbFileChrStartEnd sj.txt, where sj.txt should contain four tab-separated columns (for each junction): chr intron_start intron_end strand

You can specify this option at the genome generation step or mapping step.

On the other hand, if you want to discover all novel non-canonical junctions, without harsh filters against them, you would need: --outSJfilterOverhangMin 12 12 12 12 --outSJfilterCountUniqueMin 1 1 1 1 --outSJfilterCountTotalMin 1 1 1 1 outSJfilterDistToOtherSJmin 0 0 0 0

And you may also want to reduce the non-canonical penalties to 0: --scoreGapNoncan 0 --scoreGapGCAG 0 --scoreGapATAC 0

Cheers Alex

Vita-github commented 4 years ago

Hi Alex, I was able to extract the junction of question with suggested filters, thanks a lot for your help! I just wanted to mention, it may be useful for someone else: the filters above are great for searching for new spliceogenic mutations in RNAseq data, without the prior DNAseq.

Vita