Closed ghost closed 8 years ago
Hi @clb1000
this is a current limitation in STAR algorithm - it will not detect a splice with a short overhang and mismatches on both sides, even if the junction is annotated. Note that it still finds the correct alignment for the rest of the read, so this alignment is only "partially incorrect". In the simulation, this happens to be the main source of soft-clipping, so I am working on algorithm enhamcements to be released over summer.
Cheers Alex
Hello Alex, I have encountered a situation that appears to me to be an incorrect STAR alignment that I would expect to be a somewhat common case in typical sequencing data sets. In short, in my hands STAR is not able to do a spliced alignment (with an 11bp overhang) due to the presence of a single basepair mismatch in the vicinity of the splice junction in a read.
Here is the read pair. In Read2 I have starred the mismatches relative to hg38. (For hg38, both of the starred C nucleotides are G.)
Regardless of the input parameter value combinations for STAR (and I have tried a great many), here is the SAM output:
The problem is the soft clipping of Read2. Instead of soft-clipping, the read should be bridging the splice junction chr12:56158712-56159586 (just like Read1 does in its CIGAR part "28M875N57M"). In bridging this splice junction, STAR would pick up 10 matches and 1 mismatch, resulting in a better score.
If I change either one (and just one) of the starred C nucleotides to G, then the STAR output is correct in that the splice junction is bridged:
My questions: 1) Shouldn't the spliced alignment be favored by STAR? 2) Is there a parameter setting that would enable STAR to find the spliced alignment? (I cannot find one).
I am using STAR_2.5.1b. I would include some example parameter settings that recapitulate the above problem, but in my hands all parameter settings recapitulate the above problem. So, any will do.
Regards, Christian