Closed dmcglinn closed 5 months ago
Quick follow up additional question on beta_C
. I'm working with the invasion dataset which has some quirks but is always my go to test dataset in part because it is a bit quirky. Anyways here is the target coverage for the two treatments after removing any sites that have zero individuals:
uninvaded invaded
0.8262747 0.2112928
So the recommendation would be to then take the smallest coverage value 0.21 as the target coverage to calculate beta_C
with but unfortunately this coverage corresponds to less than one individual in the case of the uninvaded treatment so an NA is returned.
> betaC::beta_C(comm_tmp[groups_tmp == grp, ],
+ .7)
[1] 0.9967836
attr(,"C")
[1] 0.7
attr(,"N")
[1] 6
> betaC::beta_C(comm_tmp[groups_tmp == grp, ],
+ .5)
[1] 0.9957572
attr(,"C")
[1] 0.5
attr(,"N")
[1] 2
> betaC::beta_C(comm_tmp[groups_tmp == grp, ],
+ .3)
[1] NA
attr(,"C")
[1] 0.3
attr(,"N")
[1] 1
One thing that doesn't make sense is when I do the inverse N calculation from coverage:
sapply(group_levels, function(grp)
+ betaC::invChat(comm_tmp[groups_tmp == grp, ],
+ target_coverage))
uninvaded invaded
18.89982 52.04609
I see that at that coverage really I should be at a higher number of individuals. For BCI that gives the approriate results for the 55% coverage. @T-Engel any recommendations here for how to proceed? Thanks!
This all sounds good to me, Dan. I like the 'study' scale, but that is perhaps due to my current 'synthesis' focus, though it seems applicable even when you are analysing data from a single study. One use case where the permutation is inappropriate is for observational studies, where there are potential confounding or other covariates that should be included in any statistical test of the 'treatment' effect, e.g., our Mediterranean MPA study.
Another thought here, following on from Thore and Jon's reminder that we really want this function to be calculating metrics in a flexible way, I wonder if a change of name to get_mob_metrics (or similar, but not 'stats') is worthwhile?
Hey Dan, I just talked to Thore at length and I think he has an idea of what the current concerns/issues are. In essence, we want the function to be able to provide all of the 'metrics' of interest, even from a single matrix. Then, the driver can do whatever they want with those analyses. The permutation test (and much of the structure) is based on the supposition that there are two treatments to be compared, but this is often not the case.
Regardless, I personally don't like 'study level', because this is agnostic to scale. One option is
plot scale=alpha site scale=gamma
oops, sorry. I just read more carefully what you were asking
I like: Sample or plot for alpha site or group for gamma study or pool for 'uber'
Yes, the betaC code should be integrated into mobr. The the betaC package I wrote rather serves the purpose of a research compendium for the paper.
Concerning the invasion dataset, the problem is that removing empty samples is not enough. There are some samples that have only very low abundances (1,2,3,4 individuals). I would probably remove those.
library(mobr)
library(betaC)
library(tidyverse)
data("inv_comm")
data("inv_plot_attr")
# data as nested dataframe excluding samples with low abundances
inv= cbind(inv_comm, inv_plot_attr) %>%
group_by(group) %>%
nest(mat=starts_with("sp"),
sample_coords= c(x,y)
) %>%
mutate(mat=map(mat,
function(x) { x[rowSums(x)>4,]}
),
mat=map(mat,
function(x) { x[,colSums(x)>0]}
),
samples= map_dbl(mat, nrow),
)
inv
> inv
# A tibble: 2 x 4
# Groups: group [2]
group mat sample_coords samples
<fct> <list> <list> <dbl>
1 uninvaded <tibble [50 x 83]> <tibble [50 x 2]> 50
2 invaded <tibble [39 x 65]> <tibble [50 x 2]> 39
As a resullt, the number of samples is unbalanced between groups. Therefore, I use the resampling procedure betaC::beta_stand()
to calculate betaC based on 39 samples.
samples_stand = min(inv$samples)
#calculate target coverage
inv<- inv %>% mutate(C_target= map(mat, beta_stand, setsize= samples_stand,func = list("C_target") ),
C_target= map_dbl(C_target, function(x) x[1,1])
)
inv
> inv
# A tibble: 2 x 5
# Groups: group [2]
group mat sample_coords samples C_target
<fct> <list> <list> <dbl> <dbl>
1 uninvaded <tibble [50 x 83]> <tibble [50 x 2]> 50 0.831
2 invaded <tibble [39 x 65]> <tibble [50 x 2]> 39 0.542
Now calculate betaC for a coverage of 0.542
C_target_min= min(inv$C_target)
inv<- inv %>%
mutate(betaC= map(mat, beta_stand,
setsize= samples_stand,
func = list("beta_C"),
args=list(C= C_target_min) ,
summarise=F))
Now we plot it:
inv %>%
unnest(betaC) %>%
ggplot(aes(x=group,y=betaC))+
geom_boxplot()
So the invaded site is more spatially aggregated than the uninvaded one.
The problem you're experiencing with invChat
happens because you're supplying a matrix, I think. The function expects a vector of abundances. That means in your example you probably need to use colSums()
on your matrix.
Thanks everyone for the very helpful input!
I've just pushed an additional commit to the dev
branch: https://github.com/MoBiodiv/mobr/commit/316776f4960cdb04b7229ae5749471d0a18e733c which I think takes us a step closer of where we want to be. Now you can write the following:
library(mobr)
library(betaC)
inv= cbind(inv_comm, inv_plot_attr) %>%
group_by(group) %>%
nest(mat=starts_with("sp"),
sample_coords= c(x,y)
) %>%
mutate(mat=map(mat,
function(x) { x[rowSums(x)>4,]}
),
mat=map(mat,
function(x) { x[,colSums(x)>0]}
),
samples= map_dbl(mat, nrow),
)
# compute normal div stat
index = 'S'
div_groups <- inv %>% mutate(div = map(mat, calc_comm_div, index = index))
div_groups
head(div_groups$div[[1]])
scale index sample_size effort coverage value
1 alpha S 1 NA NA 12
2 alpha S 1 NA NA 7
3 alpha S 1 NA NA 11
4 alpha S 1 NA NA 11
5 alpha S 1 NA NA 5
6 alpha S 1 NA NA 5
tail(div_groups$div[[1]])
scale index sample_size effort coverage value
48 alpha S 1 NA NA 9.000000
49 alpha S 1 NA NA 10.000000
50 alpha S 1 NA NA 22.000000
51 gamma S 50 NA NA 83.000000
52 beta beta_S 50 NA NA 8.185404
53 beta beta_C 50 40 0.8262747 1.309023
abund_study = matrix(colSums(inv_comm), ncol = ncol(inv_comm))
div_study = calc_comm_div(abund_study, index = index)
div_study
scale index sample_size effort coverage value
1 alpha S 1 NA NA 111
2 gamma S 1 NA NA 111
3 beta beta_S 1 NA NA 1
4 beta beta_C 1 9668 0.9967475 1
The key changes are:
calc_div
https://github.com/MoBiodiv/mobr/blob/316776f4960cdb04b7229ae5749471d0a18e733c/R/mobr_boxplots.R#L161calc_biodiv
from the nested list data structure that @T-Engel has proposed and I likebetaC
functions to compute beta coverageI think it would be a good idea to report for every diversity statistic three attributes:
Right now you can see that sometimes these are reported correctly and there are placeholders but this needs improvement.
To do from here:
get_mob_stats
to work with this improved work flow
plot_rarefaction
to include study scale curveGood job, Dan! I like that you got rid of calc_biodiv() and created 2 other functions instead. So calc_div() works on a single vector and calc_comm_div() works on a matrix where a row is an alpha sample and the total matrix is the gamma of that group.
I also like that calc_comm_div() can give you all three scales. Maybe it's worth thinking about whether we want to add a "scales" argument to specify which ones should be returned. I'm not sure if it is a good idea for betaC to be returned as the default when index is "S". That can be misleading, I think. Especially, because now you coded it such that target coverage is directly determined from the matrix rather than supplying a user-defined target coverage that would be constant across all sites. Maybe "beta_C" should just become its own index?
I think it's a good idea to report, sample size, effort and coverage with each estimate. In beta_C I only spit out coverage and number of individuals as attributes.
Hey Thore, yes calc_div
and calc_comm_div
are as you described.
I went ahead and quickly added a scales
argument as you suggested - I think that's smart: https://github.com/MoBiodiv/mobr/commit/101eb15d4797412aeb32bb97b293d2b757735e3a
One potentially useful way to avoid having a separate function for beta_C
is to add a diversity index called S_C
that we encode in the function calc_div
. This could either accept a target coverage or if none is supplied go ahead and compute a target coverage to use. Then beta_C would be spit out when someone ran the following:
calc_comm_div(inv_comm, index = 'S_C', scales = 'beta')
let me know what you think.
hey folks, I just pushed a script to dev branch examining some grasshopper data from Konza prairie: https://github.com/MoBiodiv/mobr/commit/a82d5e057b36e99dbae8a57d9ceb5a32ba305265 Can you please take a look. In particular I need a tip on how to use the nested data structure that you like for generating graphics via ggplot2 (@T-Engel or @sablowes). I'm not sure I carried out the nesting in the appropriate manner here as well: https://github.com/MoBiodiv/mobr/blob/dev/scripts/case_studies.R#L110
Here is the prelim look at the patch-burn treatment vs control pastures for the study across 9 years of data where 4 replicates are sampled each year in each treatment group. Note this is a subset of a much bigger dataset down to just two pastures - the pasture id (i.e., Watershed) is a mess and I'm waiting to hear back about this but this is good for now. So the very simple look at the IBRs here suggest that the patch-burn treatment does not increase diversity as expected in fact it actually decreases it a bit.
I've been working on updating the mobr code for computing the beta diversity metrics. I'll post a link to my prototype code in a bit. Here are the main changes I'm in the process of implementing let me know if you have suggestions or questions: The main function for carrying out diversity metric estimation is
get_mob_stats
. I want that function to:Some quick questions I have:
get_mob_stats
was often not appropriate or was inappropriately interpreted by users. Can you please elaborate so we can update the documentation with these notes and modify the default behavior appropriately. I still feel that there are certain contexts in which a random sample swap is appropriate to test the hypothesis of are the diversity indices different between treatments. It's a pretty standard non-parametric procedure for doing this.beta_C
package functions intomobr
or are you wanting to keep that as a separate R package?