MoBiodiv / mobr

Tools for analyzing changes in diversity across scales
Other
23 stars 18 forks source link

Calculating and extracting beta diversity #227

Closed T-Engel closed 5 months ago

T-Engel commented 5 years ago

Hi all,

I find it a bit tricky to calculate the beta diversities using mobr (beta_sn, beta_S, beta_S_PIE).

Currently, the beta diversities are calculated in the function get_mob_stats() at the sample scale, but to my knowledge, it is not possible to get them as one value per gamma (i.e. Whittakers multiplicative beta). Take the following example for beta_S:

Suppose we have 3 (alpha) samples with 5, 8, and 2 species which make up 10 unique species at the gamma scale. Then, Whittaker's multiplicative beta_S would be S_gamma/mean(S_alpha) = 10/mean(5,8,2)=10/5=2 I think we don't do this in mobr. Instead, we calculate beta diversity individually for all i samples as S_beta_i =S_gamma/S_alpha_i. In my example, this gives us the values 10/5 = 2, 10/8 = 1.25 and 10/2 = 5. Note that the mean [i.e. (2+1.25+5)/3=2.75] is overestimating the value that we calculated previously. In summary:

mean(gamma/alpha_i) != gamma/mean(alpha_i), where alpha_i is the diversity of the ith sample and gamma is the total group scale diversity.

Why is it that mobr uses the approach on the left side of the equation and most other literature that I find seems to use the approach on the right? To be fair, we don't actually calculate the mean of all the i beta values but present them as a boxplot. Yet, after reading the two papers I think users are expecting to get one beta diversity per group and we don't really deliver that. I had to create my own betadiversity funtion that calculates beta_sn, beta_S, beta_S_PIE using mean alpha diversities like on the right handside of the equation. Do you have any thoughts on this?

Thank you! :)

dmcglinn commented 5 years ago

Hey @T-Engel thanks for bringing these issues up. This is a great question and I think you are correct that in most studies if someone calculates Whittaker's beta they are referring to beta = gamma / mean(alpha). I think it could probably be shown that our beta = mean(gamma / alpha_i) must be lower or equal to Whittaker's beta due to Jensen's inequality because beta is a convex function of alpha. This should certainly be made more clear in the documentation and we may even want to offer a separate more traditional estimate of Whittaker's beta as well (I'm not sure).

I think when @FelixMay was designing this function we had a discussion about this and my feeling is that we wanted a sense of variation in beta diversities so that we could better connect this to the variation in each site-specific individual rarefaction curves. If we rely on the traditional measure of beta as gamma / mean(alpha) then we only have one value for each treatment to compare (this is similar to the case when we compare gamma).

Although as you noted most studies of Whittaker's beta compute a different quanity than we have provided, there still does exist a strong precident for the kind of analysis we have provided. This is because many studies use mean pairwise dissimilarity as an index of species turnover. Koleff et al. (2003) showed that gamma / mean(alpha) for two sites is equal to

2(a + b + c) / (2a + b + c)

where a is the number of shared species, and b & c are the number of species unique to each sample. This formulation is helpful because it demonstrates that pairwise whittaker's beta this is very similar to the Sorensen and Jaccard indices of turnover. Therefore we're treating our calculation of Whittaker's beta more like a pairwise turnover index which are quite common.

For your own purposes why did you feel like you wanted gamma / mean(alpha) ? Was this to compare to published values of beta diversity?

I wonder if there is more statistical power to use the randomization test using the F as the test statistic for gamma / alpha_i or if it is more powerful to use D_bar to compare gamma / mean(alpha) - which is how we carry out the gamma diversity test? This might be the most objective way to decide which metric is best to promote with our package.

dmcglinn commented 5 years ago

Chao et al. 2014 (pg. 51 equ 6 and pg 63 first paragraph) touch on a related issue as well. Although they are not directly discussing how to best compute beta diversity they are trying to formulate rarefaction curves for the Hill diversities. They base the curve calculations not off averaging across all values of D given m samples but rather by first computing the average community abundance vector for m samples then computing D based on that expected species frequency. They provide some reasons why this is a good idea in the Discussion (pg63),

In our formulation of a diversity accumulation curve, we define the expected diversity of a finite sample of size m as the Hill numbers based on the expected abundance frequency counts E[f_k(m)]; k = 1, ..., m. Our proposed theoretical formula is given in Eq. 6. An alternative definition would be the average Hill numbers over many samples of size m taken from the entire assemblage. Although the two approaches generally yield very close numerical values, our approach has two main advantages. We have shown (see summaries in Tables 1 and 2) that accurate estimators via estimation of frequency counts can be obtained for our approach. However, it is difficult to accurately estimate the alternative formula of the expected Hill numbers; usually algorithmic methods are needed. Another advantage is that all transformations between diversity measures are valid for any size m under our formulation. For example, Hill number of order 2 for any sample size m is exactly the inverse of the Simpson concentrations for the same size when all are based on the same expected frequencies. This is important because all diversity measures give consistent comparisons. If we use the alternative approach, then such transformations will not be exactly valid, and different measures may produce different comparative results. As proved in Propositions D1 in Appendix D, the two approaches are identical for species richness, and the same conclusion is valid for expected sample coverage.