Xinglab / rmats-turbo

Other
228 stars 55 forks source link

rMATS FDR and Pvalue issue #143

Closed OnkarMulay closed 3 years ago

OnkarMulay commented 3 years ago

I have run rMATS for 200 samples with read depth ranging from 2 million to 15 million I find that [AS_Event].JC/JCEC.txt all genes have FDR and p-value of 1

Are my poor read depth samples corrupting information for other good samples as well? What has happened that I see all FDR and P-values as 1?

EricKutschera commented 3 years ago

It's hard to say what caused the FDR and P-values without more details. Can you post the command line you ran?

If you are running rmats v4.1.1 then there is a file in the --tmp directory named {datetime}_read_outcomes_by_bam.txt. That file records how many reads were filtered out for specific reasons and it may indicate what the issue is. Can you post the output for 1 BAM from that file as well?

OnkarMulay commented 3 years ago

Thanks for the follow-up I am sharing the script but I am running v4.1.0

!/bin/bash

PBS -l walltime=48:00:00

PBS -q normal

PBS -l jobfs=1gb

PBS -l mem=60gb

PBS -l ncpus=24

PBS -P zq45

PBS -l storage=scratch/zq45+gdata/zq45

python3 /scratch/zq45/om4416/rMATS-2/rmats-turbo-4.1.0/rmats.py --s1 /scratch/zq45/om4416/File1.txt --s2 /scratch/zq45/om4416/File2.txt --gtf /scratch/zq45/om4416/HomoSapiens/Homo_sapiens.GRCh37.72.gtf --bi /scratch/zq45/om4416/Starindexv2_72 --variable-read-length -t paired --readLength 250 --nthread 4 --od /scratch/zq45/om4416/rMtas_Onkar --tmp /scratch/zq45/om4416/tmp_output_Onkar

I'll re ask this : Can this happen because I ran for Samples even with bad read depths? Has it corrupted the results for good samples as well?

Or Let me know what else I should give you, here.

Thanks a lot 👍

EricKutschera commented 3 years ago

Samples with low read depths which lead to low supporting read counts for the splicing events could result in p-values of 1. If you take a look at one of the output files like SE.MATS.JC.txt you can see the supporting read counts in the columns like IJC_SAMPLE_1. If some of the samples almost always have zero counts then those specific samples may be the problem. If most counts are zero across all samples then there may be some other issue

The script and command look fine to me. If you are able to install v4.1.1 the extra output may help determine if the reads are failing one of the filters in rmats

OnkarMulay commented 3 years ago

Ohh ok thanks, but will the bad samples even affect the samples with high read depth also?

EricKutschera commented 3 years ago

Adding in bad samples can result in a worse p-value than with just the high read depth samples. Here is some example output from the rmats statistical model. In row 1, there is only a single sample in each group and there are non-zero inclusion and exclusion reads and the p-value is 0.11. In the second row, 2 more samples with zero read counts were added to each sample group and the p-value became 1. In the third row, the zero read counts are replaced and the p-value became 0.001

ID  IJC_SAMPLE_1    SJC_SAMPLE_1    IJC_SAMPLE_2    SJC_SAMPLE_2    IncFormLen  SkipFormLen PValue
1   4   2   1   3   200 100 0.11473810052
2   4,0,0   2,0,0   1,0,0   3,0,0   200 100 1
3   4,4,4   2,2,2   1,1,1   3,3,3   200 100 0.00177197988363
OnkarMulay commented 3 years ago

Ok, that's what was needed 👍 Thank you