Xinglab / rmats2sashimiplot

GNU General Public License v2.0
135 stars 54 forks source link

Analyzing pair analysis data #129

Open tanya-lasagne opened 6 months ago

tanya-lasagne commented 6 months ago

Hello,

I've previously used Sashimi to visualize unpaired analysis data using the [AS_Event].MATS.JC.txt files. However, for paired analysis, the output files include fromGTF.[AS_Event].txt or JC.raw.input.[AS_Event].txt. I'm currently interested in skipped exon events. Can I use these files to visualize rMATS data, and if so, which one would be appropriate?

Thank you!"

EricKutschera commented 6 months ago

The -e argument to rmats2sashimiplot should be a file like SE.MATS.JC.txt. The paired stats model should produce the MATS.JC.txt output files just like the default statistical model. If you have the fromGTF and JC.raw.input files then you could run --task stat to generate the MATS.JC.txt files: https://github.com/Xinglab/rmats-turbo/tree/v4.3.0?tab=readme-ov-file#running-the-statistical-model-separately

tanya-lasagne commented 5 months ago

1) Thanks Eric! I believe I figured it out. Would you mind clarifying whether the IncLevel value below is indicating that, out of all the transcripts for that SE event detected in the sample, approximately 16% contain the included (or retained) exon and 84% exclude the exon?

2) Also, I kept getting 'nan' errors but realized that Sashimi is expecting the amount of loaded input bam files to match the amount of values under the IncLevel columns. (I'm not sure if this is a problem, but I ran rMATS with 17 bam files in one group (control) versus 54 bam files (treatment - let me know what you think).

Screen Shot 2024-05-09 at 6 58 28 PM
EricKutschera commented 5 months ago

IncLevel 0.16 does indicate that about 16% of the transcripts include the exon. The IncLevel values shown in the sashimiplot are taken from the rmats output file. The IncLevel calculation is essentially: (IJC_SAMPLE_1/IncFormLen) / ((IJC_SAMPLE_1/IncFormLen) + (SJC_SAMPLE_1/SkipFormLen)). The IncLevel is the ratio of reads that support the inclusion isoform with an effective length normalization to account for reads being more likely to come from a longer transcript as opposed to a shorter one https://github.com/Xinglab/rmats-turbo/issues/349#issuecomment-1869725478

Which reads count toward the inclusion or skipping isoform is shown in the diagram from the README: https://github.com/Xinglab/rmats-turbo/tree/v4.3.0?tab=readme-ov-file#output

The number of bam files is expected to be the same as the --b1 and --b2 used with rMATS. If you only want to plot results for a subset of files then you can use --group-info https://github.com/Xinglab/rmats2sashimiplot/tree/v3.0.0?tab=readme-ov-file#grouping