comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
258 stars 59 forks source link

Meaning of the pvalue column after diffSplice #28

Closed mikelove closed 5 years ago

mikelove commented 6 years ago

hi,

What is the meaning of the pvalue column after running diffSplice? Is this a BH "adjusted p-value" (similar to a q-value with pi0 = 1)? How is alpha used?

EduEyras commented 6 years ago

Hi Mike,

If you use -m empirical, the p-value is an empirical p-value calculated by comparing the observed deltaPSI with the distribution of deltaPSIs between replicates for events with similar expression. You can use -gc to perform correction per gene: the p-values from all events of the same gene are correcting for multiple testing. This is a BH correction (if I recall correctly) and you can control the alpha as well with -al. This correction is less harsh than a BH correction with all p-values, which on the other hand would not be necessary since not all events are compared agains the same distributions.

If you use -m classical, the p-value is a Mann-Whitney test p-value from comparing the distribution of PSI values for the same event between two groups of samples. It can also be paired.

Let me know if this helps. Any suggestion will be most welcome.

Best

E.

On Thu, May 17, 2018 at 8:08 PM, Mike Love notifications@github.com wrote:

hi,

What is the meaning of the pvalue column after running diffSplice? Is this a BH "adjusted p-value" (similar to a q-value with pi0 = 1)? How is alpha used?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/28, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWVBzvghRrP7s_FY0gAxgw-dBBCb52Vks5tzbyVgaJpZM4UDh36 .

-- Dr E Eyras

ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 E08003 Barcelona, Spain Fax: +34 93 316 0550

http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/

mikelove commented 6 years ago

Ah that's good to know. So there is not correction across genes, just within. Maybe suggest putting this statement in the docs, because users shouldn't expect FDR control across genes.

A few more suggestions: I think where you have family wise error rate in the docs, I think you should put something like "within-gene false discovery rate", because BH doesn't provide FWER control.

I think that you can drop alpha as an argument. It's not used by multipletests unless method='fdr_twostage':

https://github.com/statsmodels/statsmodels/blob/942fa69230944705f8072e9c338120455b7015a0/statsmodels/stats/multitest.py#L108-L109

I'm going to try out computing p-values per isoform, and then do my own aggregation. Interesting to know about the method differences also, I'll try out empirical vs classical.

EduEyras commented 6 years ago

Hi Mike,

Ah that's good to know. So there is not correction across genes, just within. Maybe suggest putting this statement in the docs, because users shouldn't expect FDR control across genes.

Thanks. We'll make this more clear.

A few more suggestions: I think where you have family wise error rate in the docs, I think you should put something like "within-gene false discovery rate", because BH doesn't provide FWER control.

I think that you can drop alpha as an argument. It's not used by multipletests unless method='fdr_twostage':

https://github.com/statsmodels/statsmodels/blob/ 942fa69230944705f8072e9c338120455b7015a0/statsmodels/stats/ multitest.py#L108-L109

Thanks for pointing this out.

I'm going to try out computing p-values per isoform, and then do my own aggregation. Interesting to know about the method differences also, I'll try out empirical vs classical.

Let us know how it goes. I will be interested if you find a better way to handle the significance at isoform level. We generally see different behaviours between events and isoforms. Generally it is because PSI values seem less stable at isoform level across reps, so the uncertainty is higher.

cheers

E.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/28#issuecomment-389966777, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWVB1d4xE-w2Q-oSzj5cgkaAZm05SDAks5tzcN1gaJpZM4UDh36 .

-- Dr E Eyras

ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 E08003 Barcelona, Spain Fax: +34 93 316 0550

http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/