Xinglab / rmats-turbo

Other
221 stars 53 forks source link

Discrepancy Between rMATS-turbo Results and IGV Visualization #399

Open rezarahman12 opened 5 months ago

rezarahman12 commented 5 months ago

Hi rMATS team, I wanted to discuss a discrepancy I've encountered between the results obtained from rMATS-turbo analysis and what I observed in the Integrated Genome Viewer (IGV) while examining RNA-seq data for the Agap3 gene.

During the analysis, rMATS-turbo identified an interesting exon skipping event within the Agap3 gene, located at chr5:24,704,661-24,705,007. However, upon visual inspection using IGV with mouse genome 33 (mm33), and both Refseq and GENCODE (vM33) GTF annotations, I didn't see the expected exon spreading in the specified coordinates. Instead, it appeared that two adjacent exons were covered by that coordinate (attached screenshot).

Do you have any suggestions on how to further investigate or resolve this issue?

I appreciate your time and help.

Kind regards Reza image

EricKutschera commented 5 months ago

The exons in the rMATS output are based on the --gtf. If --novelSS is used then rMATS can adjust an exon from the --gtf: https://github.com/Xinglab/rmats-turbo/issues/277#issuecomment-1486991997

What version of rMATS did you use? Can you post the command(s) you ran and the row for this event in the output file?

rezarahman12 commented 5 months ago

Hi Eric, Thanks for your prompt reply. Please see my below command- module load rmats-turbo/4.1.2 module load star

run_rmats \ --s1 s1_T.txt \ --s2 s2_Con.txt \ --gtf {path to Mus_musculus.GRCm39.108.gtf} \ --bi {path to star_mm39} \ -t paired \ --readLength 150 \ --novelSS \ --nthread 22 \ --od {path to out folder} \ --tmp {path to out folder} Please see the row details of Agap3 👍 image I appreciate your time and some insights on this. Kind regards Reza

EricKutschera commented 5 months ago

The upstream and downstream exons match the annotation (24697956, 24698124) and (24705121, 24705343). The two annotated exons in between are: (24704656, 24704810) (24704913, 24705003)

The new exon reported by rMATS is (24704662, 24705007) which overlaps those two exons but doesn't share any of the splice sites. Since rMATS was run with --novelSS it can define new novel exons if it sees supporting alignments. In this case there would need to be an alignment with the splice junction (24698124, 24704662) and another alignment with (24705007, 24705121). For both of those slice junctions, one of the splice sites is annotated and the other novel so rMATS can consider it with --novelSS. Also both splice junctions meet the minimum intron length parameter mil=50: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L163

After the novel splice sites are detected, rMATS will try to define novel exons. One option the code considers is using the novel splice site as one end of the exon and an annotated splice site as the other end. The code also will check if two novel splice sites can define an exon. In both cases the novel exon needs to meet the maximum exon length parameter mel=500: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L1184 https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L1314

Based on the output row rMATS also found 4 alignments to support the skipping junction. rMATS would then be able to detect and output that event since it had definitions of all 3 exons (1 novel) and had alignments to support all 3 splice junctions