Xinglab / rmats-turbo

Other
221 stars 53 forks source link

No intron retention detected in a manually made genome and annotation file #407

Open circR opened 3 months ago

circR commented 3 months ago

Hi, I tried to quantify the intron rentention level for a transposon gene and manually made a genome file and a gtf file. IGV visualization of the bam files in the tmp folder shows clear difference of intron rentetion, but the output file from rmats shows 0. The intron there is 41nt. image

Here's the script I used: rmats.py --s1 S1.txt --s2 S2.txt \ --gtf genome/Tc1/Tc1.test.gtf \ --readLength 150 \ --bi genome/Tc1/STAR_Index \ -t paired \ --od test/ \ --tmp tmp/ \ --variable-read-length \ --allow-clipping \ --novelSS \ --mil 30

The summary.txt shows this: EventType TotalEventsJC TotalEventsJCEC SignificantEventsJC SigEventsJCSample1HigherInclusion SigEventsJCSample2HigherInclusion SignificantEventsJCEC SigEventsJCECSample1HigherInclusion SigEventsJCECSample2HigherInclusion SE 0 0 0 0 0 0 0 0 A5SS 0 0 0 0 0 0 0 0 A3SS 0 0 0 0 0 0 0 0 MXE 0 0 0 0 0 0 0 0 RI 0 0 0 0 0 0 0 0

Here's the "read_outcomes_by_bam.txt" tmp/2024-06-06-19_16_17_795004_bam1_1/Aligned.sortedByCoord.out.bam USED: 965 NOT_PAIRED: 0 NOT_NH_1: 20 NOT_EXPECTED_CIGAR: 49 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 0 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0 CLIPPED: 0 TOTAL_FOR_BAM: 1034 tmp/2024-06-06-19_16_17_795004_bam2_1/Aligned.sortedByCoord.out.bam USED: 1893 NOT_PAIRED: 0 NOT_NH_1: 16 NOT_EXPECTED_CIGAR: 110 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 0 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 3 CLIPPED: 0 TOTAL_FOR_BAM: 2022

Could you help with this? Thanks a lot.

EricKutschera commented 3 months ago

rMATS usually needs the gtf to have a transcript with the intron included and another transcript with the intron spliced out. This post has some details: https://github.com/Xinglab/rmats-turbo/issues/17

From the IGV screenshot it looks like the gtf might only have a transcript with the intron spliced out (if the top blue line is the entire gene and not a transcript)

circR commented 3 months ago

Hi, Thanks for the answer. There was a transcript line for the full length gene in the gtf file. After I checked the issue #17 , I changed the "transcript" to "exon" and now rmats detects that intron retention. Does that mean I need to create an exon annotation for each possible intron rentention event? For example: image If multiple exons in a gene are retained, how could I let rmats to detect all these events? Will a full-length transcript annotation plus every exon annotation be enough or I have to annotate every possible new exon produced by intron retention? Thank you for your help.

EricKutschera commented 3 months ago

I think you'll need to have an exon annotation for each retained intron event that you want rmats to report. The code needs the "exon" that is actually two exons with the intron retained. In that example it looks like there are 4 exons. You could use a gtf that has each of those 4 exons defined and also 3 additional exons which go from the start of one exon to the end of the next: exon1-exon2, exon2-exon3, exon3-exon4