DoaneAS / rseqc

Automatically exported from code.google.com/p/rseqc
0 stars 0 forks source link

Unusually Large Percengate of Novel Junctions #27

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Align long (100+ bp) reads with STAR, either Paired End or Single Read
2. Run RSeQC Junction Annotation script on aligned file

What is the expected output? What do you see instead?
In the past, we have run this function on 50bp single read experiments and 
observed relatively low percentages of splicing junctions and events. This was 
true for both STAR and Tophat alignments. Now, I am seeing huge percentages of 
novel splicing junctions in my 100bp PE experiments aligned with STAR. It 
ranges from 30% to 60% novel splicing junctions in several different samples. I 
determined that aligning these long reads with Tophat will not cause the same 
output in RSeQC. I also found that read length had the biggest effect on the 
reported percentage of novel junctions, regardless of pairedness. Note that it 
is only junctions that are suspicious, while splicing events remains below 10% 
in every sample. 

What version of the product are you using? On what operating system?
RSeQC v2.3.7 on 2.6.32-358.18.1.el6.x86_64 GNU/Linux 

Please provide any additional information below.
The most unusual part about this is that my "novel" junctions are really +/- 1 
or 2 bases from the annotated junctions! Even better, this only occurs where 
the transcript sequence is not disturbed by splicing in a different place. In 
other words, if the first base of the spliced out intron is the same as the 
first base of the next exon, it is typically reported as a novel splicing 
event. The true, annotated location is also reported by the program. 

For an example, check the following intron in a genome browser (attached): 
chr6:74,227,988-74,228,076. If you splice out chr6:74,227,989-74,228,077 
instead, the transcript sequence is identical, but it is now called a novel 
splice. This is exactly the behavior I am getting in RSeQC. Can anyone describe 
how RSeQC determines a novel splice junction and why these novel junctions are 
not consistent with the STAR splice junction output log?

Original issue reported on code.google.com by kroll...@osu.edu on 2 Dec 2013 at 9:15

Attachments: