Xinglab / rmats-turbo

Other
221 stars 53 forks source link

"--cstat" parameter #317

Open soumitrakp opened 1 year ago

soumitrakp commented 1 year ago

Thanks a lot for developing such a useful tool.

I have a doubt regarding the "--cstat" parameter to rmats.py. I tried four different values of the parameter: 0.05, 0.01, 0.001 and 0.0001 and I got the following summary stats (I manually curated the SignificantEventsJC columns from the four summary.txt files):

EventType       TotalEventsJC   TotalEventsJCEC SignificantEventsJC_0.05     SignificantEventsJC_0.01     SignificantEventsJC _0.001    SignificantEventsJC_0001
SE      2791    2975    4       21      31      40
A5SS    1703    1742    1       10      12      13
A3SS    1746    1763    2       12      15      15
MXE     399     1092    1       1       3       3
RI      1048    1076    5       12      20      21

I was expecting that as I make the FDR cutoff more stringent (from 0.05 to 0.01 and so on) there should be less number of significant events. However, I am getting the opposite results. Could you please explain how the parameter works?

Thanks and let me know if you need more information.

EricKutschera commented 1 year ago

From the README: https://github.com/Xinglab/rmats-turbo/tree/v4.1.2#all-arguments

--cstat CSTAT The cutoff splicing difference. The cutoff used in the null hypothesis test for differential splicing. The default is 0.0001 for 0.01% difference. Valid: 0 <= cutoff < 1. Does not apply to the paired stats model

The --cstat parameter influences the p-value calculation. The null hypothesis for the p-value is that the difference in percent spliced in (delta PSI) is less than cstat. The smallest cstat value that you used was 0.0001. With such a small cstat it is relatively easy to reject the null hypothesis and the p-values should be more significant (smaller). With a higher cstat like 0.05 it's more difficult to reject the null hypothesis and the p-values should be larger

The significant events column from the summary file is counting events with FDR <= 0.05 where that 0.05 is the default value of --p-cutoff in the script that creates the summary file: https://github.com/Xinglab/rmats-turbo/blob/v4.1.2/rMATS_P/summary.py#L19 . The FDR cutoff from --p-cutoff is separate from --cstat

soumitrakp commented 1 year ago

Thanks Eric for your quick and to the point answer. Sorry that I confused the parameter "--cstat" with p-value cutoff.

Any suggestion for a good value for "--cstat" parameter? Something like 0.001 somewhere in the middle between 0.05 and 0.0001?

Best regards.

EricKutschera commented 1 year ago

This post has some discussion about setting --cstat: https://groups.google.com/g/rmats-user-group/c/RIGKPwXK9eI/m/mkGEoARaAgAJ