Xinglab / rmats-turbo

Other
228 stars 55 forks source link

zero significant results without errors #340

Open ELMMA-FUNNY opened 1 year ago

ELMMA-FUNNY commented 1 year ago

Hello,I stalled rmats 4.2.0 successfully and use it with my data; However,when I cat summary.txt, there is zero significant results without errors.

EventType EventTypeDescription TotalEventsJC TotalEventsJCEC SignificantEventsJC SigEventsJCSample1HigherInclusion SigEventsJCSample2HigherInclusion SignificantEventsJCEC SigEventsJCECSample1HigherInclusion SigEventsJCECSample2HigherInclusion SE skipped exon 18997 19101 0 0 0 0 0 0 A5SS alternative 5' splice sites 1750 1757 0 0 0 0 0 0 A3SS alternative 3' splice sites 3021 3024 0 0 0 0 0 0 MXE mutually exclusive exons 1648 1655 0 0 0 0 0 0 RI retained intron 2801 2865 0 0 0 0 0 0

my nohup.out is: gtf: 14.582498788833618 There are 55414 distinct gene ID in the gtf file There are 142435 distinct transcript ID in the gtf file There are 34375 one-transcript genes in the gtf file There are 842144 exons in the gtf file There are 26924 one-exon transcripts in the gtf file There are 21849 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 2.570379 Average number of exons per transcript is 5.912479 Average number of exons per transcript excluding one-exon tx is 7.057510 Average number of gene per geneGroup is 7.451096 statistic: 0.022632122039794922

read outcome totals across all BAMs USED: 223518900 NOT_PAIRED: 0 NOT_NH_1: 187718067 NOT_EXPECTED_CIGAR: 3597864 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 4899587 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 1508894 CLIPPED: 46964838 total: 468208150 outcomes by BAM written to: ./tmp_output/2023-11-12-00_15_53_754913_read_outcomes_by_bam.txt

novel: 597.2351396083832 The splicing graph and candidate read have been saved into ./tmp_output/2023-11-12-00_15_53754913*.rmats save: 3.061922550201416 loadsg: 0.07759261131286621

========== Done processing each gene from dictionary to compile AS events Found 31022 exon skipping events Found 2035 exon MX events Found 9246 alt SS events There are 5835 alt 3 SS events and 3411 alt 5 SS events. Found 4188 RI events

ase: 1.350733995437622 count: 6.015069484710693 Processing count files. Done processing count files.

Could you help me? Thanks,Elmma.

EricKutschera commented 1 year ago

The read outcome totals look ok with about 47% USED. About 40% were NOT_NH_1 (multimapped) and 10% CLIPPED. I don't have a suggestion for the multimapping, but you could try running rMATS with --allow-clipping

From the summary file there were plenty of detected events, but they weren't significant. It might be that one of the samples or sample groups has very few read counts. Can you post the ./tmp_output/2023-11-12-00_15_53_754913_read_outcomes_by_bam.txt and a few lines of the SE.MATS.JC.txt output file

ELMMA-FUNNY commented 1 year ago

Thanks for your suggestion, When I cat 2023-11-12-00_15_53_754913_read_outcomes_by_bam.txt /home/jiayi01/chenhui_project/alternative_splicing/star_bam/WT1Aligned.sortedByCoord.out.bam USED: 60633101 NOT_PAIRED: 0 NOT_NH_1: 43766508 NOT_EXPECTED_CIGAR: 983583 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 1372611 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 358629 CLIPPED: 11075296 TOTAL_FOR_BAM: 118189728 /home/jiayi01/chenhui_project/alternative_splicing/star_bam/WT2Aligned.sortedByCoord.out.bam USED: 52752762 NOT_PAIRED: 0 NOT_NH_1: 48546224 NOT_EXPECTED_CIGAR: 732163 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 1002917 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 319875 CLIPPED: 10437113 TOTAL_FOR_BAM: 113791054 /home/jiayi01/chenhui_project/alternative_splicing/star_bam/KO1Aligned.sortedByCoord.out.bam USED: 58478200 NOT_PAIRED: 0 NOT_NH_1: 51698340 NOT_EXPECTED_CIGAR: 1081372 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 1461927 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 425003 CLIPPED: 12607460 TOTAL_FOR_BAM: 125752302 /home/jiayi01/chenhui_project/alternative_splicing/star_bam/KO2Aligned.sortedByCoord.out.bam USED: 51654837 NOT_PAIRED: 0 NOT_NH_1: 43706995 NOT_EXPECTED_CIGAR: 800746 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 1062132 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 405387 CLIPPED: 12844969 TOTAL_FOR_BAM: 110475066

And a few lines of the SE.MATS.JC.txt : ID GeneID geneSymbol chr strand exonStart_0base exonEnd upstreamES upstreamEE downstreamES downstreamEE ID IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue FDR IncLevel1 IncLevel2 IncLevelDifference 0 "ENSMUSG00000095041.8" "ENSMUSG00000095041" chrJH584304.1 - 53535 54479 52189 52367 54720 54867 0 736,398 2,0 414,178 0,0 298 149 0.205426654983658 0.384469360850537 0.995,1.0 1.0,1.0 -0.002 1 "ENSMUSG00000095041.8" "ENSMUSG00000095041" chrJH584304.1 - 56985 57151 55111 55701 58563 58835 1 145,57 18,10 20,10 4,8 298 149 0.141263263762805 0.311820905614514 0.801,0.74 0.714,0.385 0.221 2 "ENSMUSG00000095041.8" "ENSMUSG00000095041" chrJH584304.1 - 56985 57151 55111 55701 59591 59667 2 85,32 10,4 11,1 3,1 298 149 0.124246190376017 0.291801585043524 0.81,0.8 0.647,0.333 0.315 3 "ENSMUSG00000095041.8" "ENSMUSG00000095041" chrJH584304.1 - 58563 58835 55111 55701 59591 59667 3 90,62 10,4 83,41 3,1 298 149 0.0543319372196526 0.238010519605316 0.818,0.886 0.933,0.953 -0.091 4 "ENSMUSG00000095041.8" "ENSMUSG00000095041" chrJH584304.1 - 58563 58835 56985 57151 59591 59667 4 133,77 1,0 88,42 0,0 298 149 0.345133709006654 0.506346949341067 0.985,1.0 1.0,1.0 -0.008 252 "ENSMUSG00000031355.17" "Arhgap6" chrX + 168078037 168078127 168064956 168065054 168081416 168081465 252 1,3 4,0 0,1 0,0 239 149 0.999202115572178 0.99952114562185 0.135,1.0 NA,1.0 -0.432 253 "ENSMUSG00000031355.17" "Arhgap6" chrX + 168081416 168081465 168064956 168065054 168083058 168083333 253 5,0 0,0 0,0 0,2 198 149 NA NA 1.0,NA NA,0.0 1.0 254 "ENSMUSG00000031358.18" "Msl3" chrX - 167453215 167453307 167452485 167452611 167454138 167454239 254 30,32 0,0 32,32 1,0 241 149 0.054693708101929 0.238010519605316 1.0,1.0 0.952,1.0 0.024 255 "ENSMUSG00000031358.18" "Msl3" chrX - 167455258 167455341 167454790 167454886 167456723 167456894 255 4,4 5,14 5,8 18,16 232 149 0.733490403991095 0.857312231189741 0.339,0.155 0.151,0.243 0.05

EricKutschera commented 1 year ago

All of the samples have similar read outcome totals which all seem ok. The read counts from the few lines of SE.MATS.JC.txt also seem ok. Each sample has inclusion and skipping counts for most of the events. I don't see anything wrong with the data. It just looks like the data doesn't include any significant splicing differences. For example:

3 "ENSMUSG00000095041.8" "ENSMUSG00000095041" chrJH584304.1 - 58563 58835 55111 55701 59591 59667 3 90,62 10,4 83,41 3,1 298 149 0.0543319372196526 0.238010519605316 0.818,0.886 0.933,0.953 -0.091

That event has plenty of reads to support the inclusion isoform (90,62, 83,41) and a few to support the skipping isoform (10,4 and 3,1), but the delta PSI is only -0.091 and the pvalue (0.054) and FDR (0.238) are not significant

ELMMA-FUNNY commented 12 months ago

Thanks for your suggestion,maybe I should look for another software to make duplicating splicing.

kbqt14 commented 6 months ago

Good evening, following the conversation. I have a query. Why at the end of executing the commands you get 1 different output than what is seen in the summary.txt file? I use the example provided by ELMMA-FUNNY at the end of executing the command there are 4188 RI events and in the summary.txt file there are 2801 RI events

examplesummary
EricKutschera commented 6 months ago

The number printed to stdout is the count of all events that rMATS detected:Found 4188 RI events

The numbers in the summary.txt file are after a filter is applied and the statistical test run: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L336

Here's a related post: https://github.com/Xinglab/rmats-turbo/issues/265#issuecomment-1927162591

kbqt14 commented 6 months ago

Thank you very much for your quick reply @EricKutschera . Finally I would like to take this space to consult the following:

in the alignment to obtain the .BAM file, I used the following commands hisat2 -x $GENOME --rna-strandness F --trim5 0 --trim3 0 --phred33 -p $threads -U $sample

I have specified that it is from a forward stranded library.

However, at the end of the alignment and when I proceed to count the strand sense of the reads I get the following: samtools view WT_T0_1.bam | grep "XS:A:-" | wc -l 25456452 counts for (-) strand samtools view WT_T0_1.bam | grep "XS:A:+" | wc -l 21123743 counts for (+) strand

So I would like to know which rmats turbo option would be the right one in my case? --libType fr-firststrand In this case I have read outcome totals across all BAMs USED: 18860342 NOT_PAIRED: 0 NOT_NH_1: 60705966 NOT_EXPECTED_CIGAR: 729587 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 105435862 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 3849189 CLIPPED: 0 total: 189580946 outcomes by BAM written to: ./test_WTT0vsWTT8_tmpf/2024-05-10-10_02_29_574001_read_outcomes_by_bam.txt

novel: 64.85731625556946 The splicing graph and candidate read have been saved into ./test_WTT0vsWTT8_tmpf/2024-05-10-10_02_29574001*.rmats save: 0.03405117988586426 loadsg: 0.0004885196685791016

========== Done processing each gene from dictionary to compile AS events Found 5 exon skipping events Found 0 exon MX events Found 18 alt SS events There are 10 alt 3 SS events and 8 alt 5 SS events. Found 10 RI events

or use the following? --libType fr-unstranded In this case I have read outcome totals across all BAMs USED: 127268545 NOT_PAIRED: 0 NOT_NH_1: 60705966 NOT_EXPECTED_CIGAR: 729587 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 447371 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 429477 CLIPPED: 0 total: 189580946 outcomes by BAM written to: ./test_WTT0vsWTT8_tmpu/2024-05-10-10_00_29_647596_read_outcomes_by_bam.txt

novel: 73.78664708137512 The splicing graph and candidate read have been saved into ./test_WTT0vsWTT8_tmpu/2024-05-10-10_00_29647596*.rmats save: 0.3331418037414551 loadsg: 0.0171053409576416

========== Done processing each gene from dictionary to compile AS events Found 304 exon skipping events Found 17 exon MX events Found 1193 alt SS events There are 547 alt 3 SS events and 646 alt 5 SS events. Found 367 RI events

I have reviewed the google group posts but I am not sure which is the correct option in my case.

EricKutschera commented 6 months ago

https://daehwankimlab.github.io/hisat2/manual/

TopHat has a similar option, --library-type option, where fr-firststrand corresponds to R and RF; fr-secondstrand corresponds to F and FR.

https://github.com/Xinglab/rmats-turbo/tree/v4.3.0?tab=readme-ov-file#tips

--libType options are based on TopHat --library-type

Based on the documentation it looks like --libType fr-secondstrand should be used with --rna-strandness F

That seems to agree with your results where --libType fr-firststrand led to only about 10% of your alignments being USED, compared to about 67% with fr-unstranded which skips the strand checks