Open Oliverfeudj opened 1 month ago
By default rmats_long.py
outputs a separate .tsv for each of the most significant isoforms showing the differences of that isoform to the most significant isoform in that gene with a delta proportion in the opposite direction
If you run rmats_long.py
with --compare-all-within-gene
then when it calls classify_isoform_differences.py
it won't use --second-transcript-id
and each .tsv will have the differences of the most significant isoform to all other isoforms in the gene
I added a branch which changes classify_isoform_differences.py
to do a pairwise comparison of isoforms in a gene when run without --second-transcript-id
: https://github.com/Xinglab/rMATS-long/commit/bc2bdfb59972ad7090047024b006d9c7128b7b44
Doing a pairwise comparison can take a long time for genes with many isoforms
Hi @EricKutschera
I have used --compare-all-within-gene when running rmats_long.py and I have a question regarding the output. Here's the command line I used:
python ./rMATS-long/scripts/rmats_long.py --abundance espresso_data/HEK293T-all/HEK293T-all_N2_R0_abundance.esp --updated-gtf espresso_data/HEK293T-all/HEK293T-all_N2_R0_updated.gtf --gencode-gtf ${gtf} --group-1 rMATS-long_data/input/HEK293T-all_group1.txt --group-2 rMATS-long_data/input/HEK293T-all_group2.txt --group-1-name WT --group-2-name KO --out-dir ${output_path} --plot-file-type .png --compare-all-within-gene
In the differential_transcript.tsv, the lines containing ENSG00000004455.17 (just randomly picked) were 10 lines as follows:
grep ENSG00000004455.17 rMATS-long_data/differential_transcripts.tsv
ENSG00000004455.17 ENST00000354858.11 0.281545888814435 1 0.595689772079638 1 0.0064 0.0129 0.0081 0.0066 0.0073 0.0047 0.0091 0.0062 0.0029
ENSG00000004455.17 ENST00000373449.7 56.7319289594579 1 4.99459681293473e-14 2.34608698795577e-10 0.3385 0.354 0.4015 0.529 0.6199 0.6675 0.3647 0.6055 -0.2408
ENSG00000004455.17 ENST00000467905.5 1.41173459743368 1 0.234768653534498 1 0.0128 0.0102 0.0079 0.0 0.0036 0.0093 0.0103 0.0043 0.006
ENSG00000004455.17 ENST00000480134.5 0.462403563942644 1 0.496502801619222 1 0.1079 0.0688 0.0312 0.0548 0.0578 0.0794 0.0693 0.064 0.0053
ENSG00000004455.17 ENST00000487289.1 12.9953227365513 1 0.000312270022764174 0.0371344396057979 0.0222 0.0203 0.0391 0.0032 0.0 0.0 0.0272 0.0011 0.0261
ENSG00000004455.17 ENST00000548033.5 4.71189004554435 1 0.0299547012700919 0.555825341913299 0.0 0.0025 0.0 0.0064 0.0108 0.0187 0.0008 0.01-0.0111
ENSG00000004455.17 ENST00000550338.5 1.87958598265686 1 0.170381210967297 1 0.019 0.0228 0.0469 0.0128 0.0144 0.014 0.0296 0.0137 0.0159
ENSG00000004455.17 ENST00000629371.2 3.9830445464795 1 0.0459604191091984 0.679867100930521 0.0 0.0 0.0 0.0032 0.0109 0.0094 0.0 0.0079 -0.0079
ENSG00000004455.17 ENST00000672715.1 49.5163324181758 1 1.96727117889871e-12 7.39261163606559e-09 0.4766 0.4792 0.4491 0.3296 0.2418 0.1822 0.4683 0.2512 0.2171
ENSG00000004455.17 ESPRESSO:chr1:376:3 1.94993508482912 1 0.162593846125627 1 0.0165 0.0241 0.0163 0.0544 0.0335 0.0147 0.019 0.0342 -0.0152
I found the structure figure ENSG00000004455.17_structure.png only includes 5 transcripts. I wonder why it only has 5 transcripts, not all transcripts shown in differential_transcript.tsv?
In addition, I have another question. Could you please explain How the number next to the event type (for example, exon skipping 75) are calculated?
Thank you so much for your help.
visualize_isoforms.py has a parameter --max-transcripts which defaults to 5: https://github.com/Xinglab/rMATS-long/blob/v1.0.0/scripts/visualize_isoforms.py#L82
If you want to plot more than 5 then you'll need to add more colors here: https://github.com/Xinglab/rMATS-long/blob/v1.0.0/scripts/visualize_isoforms.py#L17
Here's the code to get the number for each event type: https://github.com/Xinglab/rMATS-long/blob/592cb3268d16aa6bea3a6b79aedceac128563e3b/scripts/rmats_long.py#L517
It looks at the isoform_differences files and it checks each pair of transcripts to see what splicing events were found between those two transcripts. If there is only 1 event for a pair of transcripts then the count for that event type will increase. If there are multiple events for a pair then it will count as combinatorial
Hello @EricKutschera,
is there a way to do this
classify_isoform_differences.py
for all isoforms? not doing individually for each isoform so that I have at the end all isoform in a single tsv file, not individual tsv for each isoformThank you in advance for your reply