MoBiodiv / mobr

Tools for analyzing changes in diversity across scales
Other
23 stars 18 forks source link

Include alpha scale individual-based rarefaction curves? #209

Open T-Engel opened 6 years ago

T-Engel commented 6 years ago

I noticed that it is a little tricky for users to calculate individual-based rarefaction curves at the site level. By that, I mean the basic functionality of the vegan function rarefy(), where you input a site-by-species matrix and a vector of desired sample sizes and what you get is a site-by-S_rare matrix for all sites and sample sizes that you plugged in.

I think there is currently no straightforward way to do this in mobr. As our rarefaction(method='indiv') function is designed for the allscales analysis, it automatically pools all sites to draw the individual-based rarefaction curve at the gamma level. However, users who supply a site-by-species-matrix to rarefaction(method='indiv') might also wish to compute the curves at the alpha/site level, as they know it from vegan or iNext. Currently, it seems that the only way to get close to this is using calc_biodiv(index = "S_n"). But here, it looks like the function doesn't work if you supply a vector of efforts as opposed to a single value. Although it calculates all the right values, they appear to be assigned to the wrong sites.

Here is some code that shows you what I mean

#load data and select only the first 3 sites as an example
data(inv_comm)
inv_comm<-inv_comm[1:3,]

# rarefy with vegan for 5 and 10 individuals
vegan::rarefy(inv_comm, c(5,10))

# trying to achieve the same with mobr
calc_biodiv(inv_comm,groups =1:dim(inv_comm)[1], index = "S_n",effort = c(5,10))

And here you can compare the outputs. Vegan uses a wide format where rows correspond to a sites, mobr uses a long format where the group column codes for the sites.

vegan: image

mobr: image

As you see, the values got mixed up.

I suggest the following fixes:

  1. Fix calc_biodiv(index = "S_n") to properly match values to sites.
  2. Update documentation of rarefaction() so that it explicitly says that for the individual-based rarefaction curve the sites will be pooled to a gamma. Furthermore, we should give directions as to how to calculate the alpha rarefaction curves by referring to vegan::rarefy(), iNext::iNext(), or the debugged calc_biodiv(index = "S_n").

Any opinions? Do you think this is important?

dmcglinn commented 6 years ago

Hey @T-Engel thanks for pointing this difference in usage out between the packages. It is possible to do this using apply(inv_comm, 1, rarefaction, 'indiv', c(5, 10)) that is not the most elegant solution but it is pretty easy still. I feel like there is a good reason for why we consolidate the site-by-species matrix down to a single SAD but I can't think of why right now let me look closer at the rarefaction function.

T-Engel commented 6 years ago

Hi @dmcglinn, Thanks for the response. Yes, using apply works. I think the plot_rarefaction function uses a similar approach when the pooled argument is FALSE. Would it make sense to include the pooled argument in rarefaction as well?

dmcglinn commented 6 years ago

That may be a good suggestion. One question we're confronted with is what do we want to return to the user if they don't specify effort should we go from 1 to the minimum rowsum (i.e., plot n)? Alternatively we would rarefy each plot to its specific n and then return a list rather than a matrix. Any thoughts here on what would be more useful? The plot_rarefaction function computes each curve to the plot specific n so copying that behavior may be nice here.

T-Engel commented 6 years ago

I agree that it would be nice to have the functions behave in the same way (i.e. output up to the plot-specific n). But I guess it will be a little tricky to change the output of rarefaction to a list because several other functions depend on it being a matrix, don't they? So maybe we could keep it as a matrix and insert NA where ever the effort is higher than the plot n - unless the user sets extrapolate TRUE, in which case we would provide the extrapolated inext values. Would that work?

dmcglinn commented 6 months ago

This is not as easy as it probably should be but we are a little closer to a simple implementation with calc_comm_div

library(mobr)
#load data and select only the first 3 sites as an example
data(inv_comm)
inv_comm<-inv_comm[1:3,]

# rarefy with vegan for 5 and 10 individuals
vegan::rarefy(inv_comm, c(5,10))

# trying to achieve the same with mobr
calc_comm_biodiv(inv_comm, scales = 'alpha', index = "S_n", effort = c(5,10))

#as before it is stills simpler to use: 
apply(inv_comm, 1, rarefaction, 'IBR', c(5,10))