biocore / evident

BSD 3-Clause "New" or "Revised" License
14 stars 9 forks source link

Add PERMANOVA effect size support #26

Closed gibsramen closed 1 year ago

gibsramen commented 1 year ago

Now allows calculation of PERMANOVA effect sizes (omega-squared) for MultivariateDataHandler instances. Math according to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4514928/ (cite in paper)

At the moment does not allow for calculation of power analysis from this effect size because the effect size is not F-distributed (same reason why PERMANOVA uses pseudo-F). @wasade do you think this is a critical missing component? Evident is set up to calculate power analytically and this will likely require some sort of involved simulation procedure.

See https://www.frontiersin.org/articles/10.3389/fmicb.2021.796025/full as well

wasade commented 1 year ago

I think a power analysis using permanova would be great. How difficult to you think it'll be?

gibsramen commented 1 year ago

I think it is doable. Will start working up a solution and hopefully get an update soon.

gibsramen commented 1 year ago

Ok so I implemented a bootstrap power analysis for PERMANOVA since there's no real numerical option. Basically I do two rounds of bootstrapping assuming N observations with groups equally distributed.

(1) Sample randomly with replacement without regard to class labels & calculate effect size (2) Sample randomly with replacement within class labels & calculate effect size

In this way, (1) is the null hypothesis and (2) is the alternative hypothesis. These are each done for some number of permutations, P. Then we just use the definition of power to determine the percent chance that the null hypothesis is not rejected out of P. Result looks something like this:

image

@wasade @antgonza does this make sense?

wasade commented 1 year ago

I think this does. In the plot, is Total Observations the total number of samples or the class size (or does this assume balanced classes)?

antgonza commented 1 year ago

Thank you @gibsramen; out of curiosity, how many samples and classes are in the original data in the plot? Also, what was it's original permanova values?

gibsramen commented 1 year ago

Total observations is total number of samples assuming balanced classes (same as the rest of Evident)

gibsramen commented 1 year ago

@antgonza

image

antgonza commented 1 year ago

Thank you @gibsramen; is it me or does it look like it saturates power when you get to the same number of samples as you have in the original data? Also, could you add 2-3 lines in the plot using different alpha? Just to check my previous observation ?

gibsramen commented 1 year ago

Yeah I can look at that.

gibsramen commented 1 year ago

Curves are kinda wonky since

1) We're doing random permutations 2) Permutations are done with replacement (in contrast to regular PERMANOVA)

but I think the trend generally holds.

image

antgonza commented 1 year ago

I think it makes sense but it might be nice to make them smoother, maybe by adding a parameter to do each step x times (like 10) and plot the average ... what do you thinkÇ

gibsramen commented 1 year ago

I upped the number of permutations from 999 -> 4999 which should be equivalent to doing your suggestion 5 times I think.

image

antgonza commented 1 year ago

Nice! Thank you!

wasade commented 1 year ago

Agree, that looks really awesome

gibsramen commented 1 year ago

Cool! If one of you could review the PR that would be super.

wasade commented 1 year ago

Can do but before starting, what was the motivation to partially reimplement PERMANOVA?

gibsramen commented 1 year ago

The implementation of PERMANOVA from scikit-bio doesn't return the sums-of-squares values needed to calculate the effect size.

https://github.com/biocore/scikit-bio/blob/107fc8fba3df69ceea6dd3ef357212eecc1f9be3/skbio/stats/distance/_permanova.py#L130

wasade commented 1 year ago

Great, thanks!