MoBiodiv / mobr

Tools for analyzing changes in diversity across scales
Other
23 stars 18 forks source link

developing tests for rarefaction #203

Closed dmcglinn closed 4 years ago

dmcglinn commented 6 years ago

@caroliver is working on a testing suite for the package https://github.com/caroliver/mobr/tree/auto_testing .

One of the most important tests we need is for the function rarefaction. This paper by Hurlburt 1976 has values that can be compared against:

Hurlbert (1971) - the nonconcept of sp div (a crit of alt params).pdf

Specifically see Table 2 where we are interested in comparing against the column labeled E(S_n). So for example the value in Table 2 and the following should be identical for the first two SADs that he provides.

sad1 = rep(10, 100)
rarefaction(sad1, 'indiv', 100)

sad2 = c(76, rep(50, 5), rep(20, 27-7), rep(5, 77-27), rep(1, 101-77))
rarefaction(sad2, 'indiv', 100)

There are other SADs listed in the table to compare against as well. This will only provide a test for the individual based SAD.

We will need to develop other tests for when rarefaction(..., method = 'samp') and rarefaction(..., method = 'spat'). For sample-based random rarefaction (i.e., method = 'samp') I think we could compare against the output generated by vegan::specaccum. Another way to check 'samp' would be to take a community and shuffle it each time computing the 'spatial' method of sampling. This may not be the best because it tests one function on another one of our functions but it is a good way to double check. Something like this would work:

data(inv_comm)
inv_comm_sub = inv_comm[1:50, ]
S_samp = rarefaction(inv_comm_sub, 'samp')
S_perm = apply(replicate(100, rarefaction(inv_comm_sub[sample(nrow(inv_comm_sub)), ], 'spat',  coords = 1:50, latlong = FALSE)), 1, mean)

The problem here is that this test is a bit slow so we'll want to optimize the method here if we go with this option.

The best way to check the spatial rarefaction would seem to be to create a simple 4 or 5 plot community and to by hand come up with the spatial curve to be compared against the R calculation.

caroliver commented 6 years ago

@dmcglinn I have completed and uploaded 5 expectations for rarefaction testing located in the 'test_rarefaction.R' file within the auto_testing branch. These 5 expectations cover all individual based SAD's provided by the Hurlbert paper in Table 2. I am now beginning work on the sample-based random rarefaction.

Quick Link to test_rarefaction.R: https://github.com/caroliver/mobr/blob/auto_testing/tests/testthat/test_rarefaction.R

dmcglinn commented 6 years ago

very cool! nice progress!

caroliver commented 6 years ago

@dmcglinn since some tests have been completed for both individual and sample based rarefaction I would like to move forward and begin working on tests for spatial rarefaction. I have searched online for the best way of calculating this by hand but have not found anything useful at this point. Are there any article you know of that you could recommend? Thanks!

dmcglinn commented 6 years ago

spatial rarefaction is trickier the best option in this case is to make an artificial site-by-species matrix to compare hand calculations with. Something like this could work:

comms = cbind(
        c(1, 1, 1, 0, 0, 0),
        c(0, 1, 1, 1, 0, 0),
        c(0, 0, 1, 1, 1, 0),
        c(0, 0, 0, 0, 1, 1),
        c(0, 0, 1, 0, 1, 0),
        c(0, 0, 0, 0, 1, 1),
        c(0, 0, 0, 0, 0, 1))

results = replicate(50,
                    rarefaction(comms, 'spat', coords = 1:6,
                                latlong=F))
avg = apply(results, 1, mean)
stdev = apply(results, 1, sd)
avg
       1        2        3        4        5        6 
2.666667 3.960000 5.000000 5.570000 6.500000 7.000000 
stdev
        1         2         3         4         5         6 
0.0000000 0.1861411 0.0000000 0.1784603 0.0000000 0.0000000 

This result is also revealing because it demonstrates that the spatial rarefaction method is not exact (this is because of ties for which site is the nearest neighbor) you can see that generates variance in this community at the 2nd and 4th accumulated plots. Therefore in this case you would need to compare those S values with a larger range of uncertainty.

dmcglinn commented 6 years ago

Maybe the simplist way to do this in general is to compute the 95% quantile around the spatial rarefaction results and ask if the hand calculated values fall within or equal to this range of values. So the following would compute the 95% quantile for S that could be compared against.

S_qt = apply(results, 1, quantile, c(0.025, .975))
> S_qt
             1        2 3        4   5 6
2.5%  2.666667 3.666667 5 5.333333 6.5 7
97.5% 2.666667 4.166667 5 5.833333 6.5 7
dmcglinn commented 6 years ago

Please ignore my previous two comments because if we simply ignore any point where there is uncertainty then we don't have to compute averages or quantiles for a clean and simple test.