alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

STAR flags to improve recognition of variant induced non-canonical splice junctions #1245

Open h-joshi opened 3 years ago

h-joshi commented 3 years ago

Hi Alex, Thanks heaps for such a spectacular tool.

I was reading through couple of the past tickets and came across this one #240 , in particular where you recommend at the end of the post to inject any novel splice junctions manually.

I've attached a screenshot and read below illustrating where there are multiple reads with soft clipped ends that map perfectly to the next exon - but get soft clipped.

Even though there is a GT created as a result of a variant, as far as alignment goes - my understanding is that it is a non-canonical junction (CT-AG) - Is my understanding right?

Secondly, I thought setting "--scoreGapNoncan" to zero would increase the likelihood of the non-canonical junction being recognised - but that didn't happen. Have I misunderstood this flag?

In general, what flags increase/decrease the likelihood of such a junction being recognised?

Diagnostically, I can see this playing out two ways 1) Individual's sequences are aligned and splice junction are visualised to identify any aberrant splicing (and then narrow down a potential causative variant) - this is usually what we do at the lab ... so being able to visually see any unusual events from get-go is incredibly handy

2) Predicted splice junctions are manually introduced (as per your note in issue #240 ) - however this is not always possible (if there is an existing GT-AG deeper in the intron) and the variant affects a splicing enhancer - the causative variant is not immediately obvious. In these instances, seeing the unusual splicing event and then zooming into that region to identify causative variants is the ideal workflow.

Read Alignment start position = chr13:51505075 (hg19) CAGAAGCAGCACCTTACTCTTGATCCAGTCTGACCCTGGCTTGCTTGTGACCTCTGACTTGCCTGACTCACTGTGCTGTGCACCTTACTGCTTGACAAAGCCTGACAGGGGAAGTTTCAGCCCCTTGATCAAGTTGTGGTGGATAACGTG

image

image

alexdobin commented 3 years ago

Hi @h-joshi

Even though there is a GT created as a result of a variant, as far as alignment goes - my understanding is that it is a non-canonical junction (CT-AG) - Is my understanding right?

Correct!

Secondly, I thought setting "--scoreGapNoncan" to zero would increase the likelihood of the non-canonical junction being recognised - but that didn't happen. Have I misunderstood this flag? This flag indeed improves your chances of finding non-canonical junctions.

In general, what flags increase/decrease the likelihood of such a junction being recognised? There are no other parameters that affect the detection of non-canonical junctions and BAM output. If you are looking at the SJ.out.tab (filtered junctions) or using --outFilterType BySJout, then you would need to adjust --outSJfilter* parameters.

Are there any reads that map to this junction? If so, you can use --twoPassMode Basic which will insert this junction after it's detected in the 1st pass.

To diagnose why the splice was not detected in this case, it would be good to add the expected junction as if it were annotated. Another option is to cut the soft-clipped sequence, and try to map it separately, to see where it would map.

Cheers Alex