Open jingxian555 opened 1 year ago
Yes, you can filter on FDR to find significant differential alternative splicing events between the two groups. This post mentions some potential filters for other columns in the output https://github.com/Xinglab/rmats-turbo/issues/183#issuecomment-1054318031
Thank you for your suggestions. Here is the code of python for advanced filtering, Is my understanding correct? And are these filtering criteria necessary? After filtering based on these criteria, there are only 30 significant differential alternative splicing events remaining. Can I just filter based on FDR < 0.05 to find significant differential alternative splicing events between the two groups? Thanks!
fout = open('significant.SE.MATS.JC.txt', 'w')
with open('SE.MATS.JC.txt','r') as fin:
line = fin.readline()
for line in fin:
line = line.strip()
arr = line.split('\t')
IJC_SAMPLE_1 = arr[12]
SJC_SAMPLE_1 = arr[13]
IJC_SAMPLE_2 = arr[14]
SJC_SAMPLE_2 = arr[15]
FDR = arr[-4]
IncLevel1 = arr[-3]
IncLevel2 = arr[-2]
IJC_SAMPLE_1_l = [int(IJC1) for IJC1 in IJC_SAMPLE_1.split(',')]
IJC_SAMPLE_1_avg = sum(IJC_SAMPLE_1_l)/len(IJC_SAMPLE_1_l)
SJC_SAMPLE_1_l = [int(SJC1) for SJC1 in SJC_SAMPLE_1.split(',')]
SJC_SAMPLE_1_avg = sum(SJC_SAMPLE_1_l)/len(SJC_SAMPLE_1_l)
IJC_SAMPLE_2_l = [int(IJC2) for IJC2 in IJC_SAMPLE_2.split(',')]
IJC_SAMPLE_2_avg = sum(IJC_SAMPLE_2_l)/len(IJC_SAMPLE_2_l)
SJC_SAMPLE_2_l = [int(SJC2) for SJC2 in SJC_SAMPLE_2.split(',')]
SJC_SAMPLE_2_avg = sum(SJC_SAMPLE_2_l)/len(SJC_SAMPLE_2_l)
IncLevel1_l = [float(I1) for I1 in IncLevel1.split(',') if I1 != 'NA']
IncLevel1_avg = sum(IncLevel1_l)/len(IncLevel1_l)
IncLevel1_range = max(IncLevel1_l) - min(IncLevel1_l)
IncLevel2_l = [float(I2) for I2 in IncLevel2.split(',') if I2 != 'NA']
IncLevel2_avg = sum(IncLevel2_l)/len(IncLevel2_l)
IncLevel2_range = max(IncLevel2_l) - min(IncLevel2_l)
if (IJC_SAMPLE_1_avg + SJC_SAMPLE_1_avg >= 10) and (IJC_SAMPLE_2_avg + SJC_SAMPLE_2_avg >= 10) and (float(FDR) < 0.05) and (IncLevel1_range > 0.05) and (0.05 < IncLevel1_avg < 0.95) and (IncLevel2_range > 0.05) and (0.05 < IncLevel2_avg < 0.95):
fout.write(line + '\n')
fout.close()
Yes, you can filter just based on FDR
For the code, I think the checks for the inclusion levels mentioned in that other post were intended to be done with all samples. Instead of requiring the PSI value to differ within a sample group like IncLevel1_range > 0.05
, I think the original intention was to check that the PSI value had some variation when looking at all samples
I apologize, but I'm having difficulty understanding your request. Can you just change the code? Thank you so much!
I saw in the article that evaluate splicing defects by comparing the percentage spliced-in (PSI) index of AS events (P < 0.05 and delta PSI > 3%) , Is delta PSI the same as abs(IncLevelDifference)?Thank you~
I'm not sure what article you're referring to, but IncLevelDifference is a difference in PSI values. Taking the absolute value of IncLevelDifference and calling that delta PSI is reasonable
For the code change
fout = open('significant.SE.MATS.JC.txt', 'w')
with open('SE.MATS.JC.txt','r') as fin:
line = fin.readline()
for line in fin:
line = line.strip()
arr = line.split('\t')
IJC_SAMPLE_1 = arr[12]
SJC_SAMPLE_1 = arr[13]
IJC_SAMPLE_2 = arr[14]
SJC_SAMPLE_2 = arr[15]
FDR = arr[-4]
IncLevel1 = arr[-3]
IncLevel2 = arr[-2]
IJC_SAMPLE_1_l = [int(IJC1) for IJC1 in IJC_SAMPLE_1.split(',')]
IJC_SAMPLE_1_avg = sum(IJC_SAMPLE_1_l)/len(IJC_SAMPLE_1_l)
SJC_SAMPLE_1_l = [int(SJC1) for SJC1 in SJC_SAMPLE_1.split(',')]
SJC_SAMPLE_1_avg = sum(SJC_SAMPLE_1_l)/len(SJC_SAMPLE_1_l)
IJC_SAMPLE_2_l = [int(IJC2) for IJC2 in IJC_SAMPLE_2.split(',')]
IJC_SAMPLE_2_avg = sum(IJC_SAMPLE_2_l)/len(IJC_SAMPLE_2_l)
SJC_SAMPLE_2_l = [int(SJC2) for SJC2 in SJC_SAMPLE_2.split(',')]
SJC_SAMPLE_2_avg = sum(SJC_SAMPLE_2_l)/len(SJC_SAMPLE_2_l)
IncLevel1_l = [float(I1) for I1 in IncLevel1.split(',') if I1 != 'NA']
IncLevel2_l = [float(I2) for I2 in IncLevel2.split(',') if I2 != 'NA']
all_IncLevel = IncLevel1_l + IncLevel2_l
IncLevel_avg = sum(all_IncLevel)/len(all_IncLevel)
IncLevel_range = max(all_IncLevel) - min(all_IncLevel)
if (IJC_SAMPLE_1_avg + SJC_SAMPLE_1_avg >= 10) and (IJC_SAMPLE_2_avg + SJC_SAMPLE_2_avg >= 10) and (float(FDR) < 0.05) and (IncLevel_range > 0.05) and (0.05 < IncLevel_avg < 0.95):
fout.write(line + '\n')
fout.close()
Thank you for your previous advice; it was very helpful. Now, I have a new question. Is the '--b1' parameter for the BAM file of the experimental group? For example, in this "SE.MATS.JC.txt" file, does IncLevelDifference > 0 indicate a higher probability of SE events occurring in group b1 compared to b2? Thank you!
The executed script is as follows: python /home/sunpf/my_data/software/miniconda3/envs/new_rmats/rMATS/rmats.py --b1 u8.txt --b2 WT.txt --gtf Arabidopsis_thaliana.TAIR10.57.gtf -t paired --readLength 150 --novelSS --od out --tmp tmp
From the README: https://github.com/Xinglab/rmats-turbo/tree/v4.2.0#output
IncLevelDifference: average(IncLevel1) - average(IncLevel2)
IncLevelDifference > 0 indicates that IncLevel1 > IncLevel2. Since IncLevel1 corresponds to --b1 and IncLevel2 to --b2, that would mean a higher inclusion level in --b1 as compared to --b2. For SE events a higher inclusion level means that the exon is included more often
Talking about events occurring in one group or the other can be confusing as mentioned in this post: https://github.com/Xinglab/rmats2sashimiplot/issues/68#issuecomment-910283252
Thank you. Is rMATS used for alternative splicing analysis with uniquely mapped reads in BAM files?
Yes, rMATS only uses uniquely mapped reads. rMATS will write a section to stdout saying how many reads were filtered out for different reasons. In that section NOT_NH_1
means the read was filtered out because it was not uniquely mapped. Here's a related post: https://github.com/Xinglab/rmats-turbo/issues/293
Thank you!
RI.MATS.JC.txt:
The rmats command you posted before included --novelSS
. With that option rmats can detect events with splice sites that are not in the --gtf
: https://github.com/Xinglab/rmats-turbo/issues/277#issuecomment-1486991997
rmats2sashimiplot doesn't show counts for novel junctions: https://github.com/Xinglab/rmats2sashimiplot/blob/v3.0.0/src/MISO/misopy/sashimi_plot/plot_utils/plot_gene.py#L148 This post discusses differences in read counts between rmats and rmats2sashimiplot https://github.com/Xinglab/rmats2sashimiplot/issues/33#issuecomment-656720560
Thanks! I removed the --novelSS from the rmats command and filtered significant RI using these filters:
The significant RI events I found all follow a pattern. For example, in the image below, the red boxes indicate significant RI. However, these significant RI regions belong to exonic regions in another transcript. That is, there are overlapping regions of exons and introns in different transcripts. Is this due to differences in exonic regions causing differences in intron retention?
The executed script is as follows: python /home/sunpf/my_data/software/miniconda3/envs/new_rmats/rMATS/rmats.py --b1 u8.txt --b2 WT.txt --gtf Arabidopsis_thaliana.TAIR10.57.gtf -t paired --readLength 150 --od out --tmp tmp
rMATS relies on the --gtf for RI events. Unless --novelSS is used, all of the RI events reported by rMATS should have the intron region overlap an annotated transcript in an exon region. This post has some details: https://github.com/Xinglab/rmats-turbo/issues/17
Thanks for your tools! There are 2 sample groups with 3 BAM files per group,but I am counfused with the results, how to filter for differential alternative splicing between the experimental group and the control group? I'm looking at this result file [AS_Event].MATS.JC.txt, can I directly filter differential alternative splicing based on FDR? Thanks!
The executed script is as follows: python /home/sunpf/my_data/software/miniconda3/envs/new_rmats/rMATS/rmats.py --b1 u8.txt --b2 WT.txt --gtf Arabidopsis_thaliana.TAIR10.57.gtf -t paired --readLength 150 --novelSS --od out --tmp tmp