Gene not included in rstat analysis

RobinVanSchendel commented 4 years ago

Below is an example of a transcript structure that shows differential exon usage between WT and geneX using DexSeq (C. elegans data). I expected this gene/transcripts also to be part of the rmats output. However it does not show up. Upon closer inspection it already does not seem to be part of the fromGTF[rest].txt files? Is that expected behaviour?

EricKutschera commented 4 years ago

Short answer:

That looks like a retained intron event which rmats will only report if it is (mostly) annotated in the gtf file. rmats is not designed to detect unannotated RI events. If rmats is not detecting the event, you can assist rmats by adding transcripts for the two isoforms to the gtf. You might be able to get rmats to detect this event by using --novelSS, but that flag is intended for adjusting one side of an annotated junction rather than inserting or removing a splice junction

More details:

rmats requires these definitions to detect RI events

the exon to the left of the arrow
the exon to the right of the arrow
an exon which starts at the left of the left exon and ends at the right of the right exon
a splice junction from the left exon to the right exon

If those three exons are defined for this gene in the gtf and there is a transcript for the gene in the gtf that includes that junction then the event will only be in fromGTF.RI.txt

rmats does have the ability to detect some unannotated (novel) events by combining information from the gtf with reads from the BAMs

If those three exons are defined, but there is no transcript with the junction, then rmats can detect the junction if there is a read to support it. In that case the event includes a "novelJunction" and the event will be in both fromGTF.novelJunction.RI.txt and fromGTF.RI.txt

If rmats is run with --novelSS then rmats will define novel exons if there is a read which includes a junction that only has one end of the junction matching an exon defined in the gtf. Essentially rmats will define a novel exon by adjusting one side of an exon defined in the gtf. If the event required --novelSS in order to be detected then the event will be in both fromGTF.novelSpliceSite.RI.txt and fromGTF.RI.txt.

RobinVanSchendel commented 4 years ago

Thanks for your answer. I digged a little bit deeper in the gtf file and I think I have found the problem, however I am still curious whether rmats should solve this or that the gtf file needs to be adjusted. The exons that are defined are:

the exon to the left (exon 2 in tran 1)
the exon to the right (exon 1 tran 1)
the exon that starts at the left of the left exon and ends at the right of the right exon (exon 1 in tran 2)

However, the 5'UTR is part of exon 1 and as you can see in the graphic is that exon 1 (in tran 1) extends slightly further than exon 1 (of tran 2). So the problem seems to be that the UTR is part of the defined exon. Is that supposed to be like that as UTRs are of course non-coding.

EricKutschera commented 4 years ago

You are right that rmats will not detect the event because the start of exon 1 is different in the two transcripts. rmats can only detect "simple" intron retention events. In this case it is a "complex" event because there are two changes for the exons that define the event (different start coordinate, and whether the intron is spliced out).

You could update the gtf so that the two transcripts have the same start coordinate and then rmats will detect the event. In your case, it sounds like removing the 5'UTR from the exon definitions in the gtf will cause the two transcripts to have the same start coordinate. I don't think handling differences in the UTR is something that rmats should do automatically, but maybe there is a tool that exists for removing the UTRs from exons in a gtf. Users could choose to use before running rmats

Xinglab / rmats-turbo

Gene not included in rstat analysis #17