Closed OnkarMulay closed 3 years ago
It's hard to say what caused the FDR and P-values without more details. Can you post the command line you ran?
If you are running rmats v4.1.1 then there is a file in the --tmp
directory named {datetime}_read_outcomes_by_bam.txt
. That file records how many reads were filtered out for specific reasons and it may indicate what the issue is. Can you post the output for 1 BAM from that file as well?
Thanks for the follow-up I am sharing the script but I am running v4.1.0
python3 /scratch/zq45/om4416/rMATS-2/rmats-turbo-4.1.0/rmats.py --s1 /scratch/zq45/om4416/File1.txt --s2 /scratch/zq45/om4416/File2.txt --gtf /scratch/zq45/om4416/HomoSapiens/Homo_sapiens.GRCh37.72.gtf --bi /scratch/zq45/om4416/Starindexv2_72 --variable-read-length -t paired --readLength 250 --nthread 4 --od /scratch/zq45/om4416/rMtas_Onkar --tmp /scratch/zq45/om4416/tmp_output_Onkar
I'll re ask this : Can this happen because I ran for Samples even with bad read depths? Has it corrupted the results for good samples as well?
Or Let me know what else I should give you, here.
Thanks a lot 👍
Samples with low read depths which lead to low supporting read counts for the splicing events could result in p-values of 1. If you take a look at one of the output files like SE.MATS.JC.txt
you can see the supporting read counts in the columns like IJC_SAMPLE_1
. If some of the samples almost always have zero counts then those specific samples may be the problem. If most counts are zero across all samples then there may be some other issue
The script and command look fine to me. If you are able to install v4.1.1 the extra output may help determine if the reads are failing one of the filters in rmats
Ohh ok thanks, but will the bad samples even affect the samples with high read depth also?
Adding in bad samples can result in a worse p-value than with just the high read depth samples. Here is some example output from the rmats statistical model. In row 1, there is only a single sample in each group and there are non-zero inclusion and exclusion reads and the p-value is 0.11. In the second row, 2 more samples with zero read counts were added to each sample group and the p-value became 1. In the third row, the zero read counts are replaced and the p-value became 0.001
ID IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue
1 4 2 1 3 200 100 0.11473810052
2 4,0,0 2,0,0 1,0,0 3,0,0 200 100 1
3 4,4,4 2,2,2 1,1,1 3,3,3 200 100 0.00177197988363
Ok, that's what was needed 👍 Thank you
I have run rMATS for 200 samples with read depth ranging from 2 million to 15 million I find that [AS_Event].JC/JCEC.txt all genes have FDR and p-value of 1
Are my poor read depth samples corrupting information for other good samples as well? What has happened that I see all FDR and P-values as 1?