MoBiodiv / mobr

Tools for analyzing changes in diversity across scales
Other
23 stars 18 forks source link

Warning needed: Computing and plotting issues when the design is unbalanced #243

Closed AlbanSagouis closed 5 years ago

AlbanSagouis commented 5 years ago

While trying mobr, it is only after I reached the delta stats computation and found NAs, and then talked with Shane and Jon that I realised that datasets had to be balanced. Early warnings would help.

dmcglinn commented 5 years ago

Thanks @AlbanSagouis for filing this bug report. Would you be able to provide a little of the R code that resulted in the error? If I understand you correctly the function get_delta_stats gave you an error when some of the treatments had fewer replicates that others, is that correct? When you say you found NA values where did you find those in your data or in the resulting output? Thanks for your help to improve the package!

AlbanSagouis commented 5 years ago

Hi Dan,

I tried to create a simple reproducible example using one of the datasets used as mobr examples (inv_comm, fire_comm and tank_comm) but I could not reproduce the issue.

Dataset description

The dataset we used has 953 sites located in two mountain areas (one is invaded). The Mill mountain was sampled in 292 sites while the Stegall mountain was sampled in 661 sites. The community matrix has 27 species. The matrix countains no NA. The environment table countains three columns: group, x and y, and no NA.

Issue description

I created the mob_in object without error. >grass_mob_in <- make_mob_in(comm=abu, plot_attr=env, coord_names = c("x", "y"), latlong=F)

Computing mobr stats sent a warning

>grass_stats <- get_mob_stats(grass_mob_in, 'group', n_perm=10)
Warning message:
In get_mob_stats(grass_mob_in, "group", n_perm = 10) :
The number of individuals for rarefaction analysis is too low and is set to the minimum of 5 individuals.

And finally computing mobr stat deltaS that also sent warning (5):

>grass_deltaS <- get_delta_stats(mob_in=grass_mob_in, group_var='group', ref_group = 'Stegall', n_perm=20)
Warning messages:
1: In S_rescaled - S_raw :
  longer object length is not a multiple of shorter object length

Now looking at grass_deltaS$N, there are 3962 rows, the first 2922 look normal but the last 1040 row have only NA values for columns "ddeltaS_null_low", "ddeltaS_null_median" and "ddeltaS_null_high". The other components of grass_deltaS look as expected. Plotting of ddelta S is impossible

>plot(grass_deltaS, 'Stegall', 'Mill', display = 'ddelta S', las=1)
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables

Logically, the overlap plot for N does not look good. >overlap_effects(grass_deltaS, 'Mill') The N curve 'jumps'.

image

Solving the issue

I don't know what really causes get_delta_stats to bug but the bug disappeared when I balanced the number of sites between the two areas (292 sites). No more NAs in grass_deltaS and a normal overlap plot.

image

The ddelta S plot was still not possible:

Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables
sablowes commented 5 years ago

Thanks Alban. Hi Dan :-)

The issue is unbalanced designs. Can there be some warnings (or perhaps errors) added to prevent statistics from unbalanced designs being naively reported.

For the discrete analyses, it would be great if the get_mob_stats() function had a warning at least when the function is called using unbalanced data. Currently, a naive user is provided with statistics to report for a test of treatment effects at the gamma-scale that are problematic when the design is unbalanced.

And for the multi-scale analyses, any unbalance has the potential to blow up the estimates of the N-effect as illustrated by Albans' example.

dmcglinn commented 5 years ago

Dear @AlbanSagouis I think I have a patch that allows for unbalanced designs now. If you have a chance would you mind testing it out? To install this patch use the following R code devtools::install_github("mobiodiv/mobr#244")

Here's an example demonstrating that is works using a demo dataset

library(mobr)
data(inv_comm)
data(inv_plot_attr)
inv_mob = make_mob_in(inv_comm, inv_plot_attr, c('x', 'y'))
table(inv_mob$env)
inv_mob2 = subset(inv_mob, subset = 25:100, type = 'integer')
table(inv_mob2$env)

inv_test = get_delta_stats(inv_mob, "group", type = 'discrete',
                            n_perm = 10)
inv_test2 = get_delta_stats(inv_mob2, "group", type = 'discrete',
                      n_perm = 10)

plot(inv_test, stat = 'b1')
plot(inv_test2, stat = 'b1')

Note that although this modification to the code will allow you to work with unbalanced designs that it still may not be a good idea if that treatment with more samples covers a broader spatial extent. I still need to add a warnings in that alert the user that an unbalanced design is being used and then also use the spatial coordinates to quickly compare spatial extents and alter the user of large differences.

AlbanSagouis commented 5 years ago

Dear @dmcglinn

I ran your patch on the example data set and it worked great. I agree that adding a warning would be helpful and informative. I've been trying to run the patch on our data set, the one I described in my previous message, but couldn't get the get_delta_stats() function to run on either a balanced or unbalanced data set but I believe it is not connected to the issue discussed here.

Thanks for addressing this issue,

Alban

dmcglinn commented 5 years ago

Hey @AlbanSagouis thanks for testing that little bit of code out. Can you please post the error messages you are seeing when you run get_delta_stats()? Thanks!

AlbanSagouis commented 5 years ago

Hi Dan,

Problem solved, I was having issues with the env_var argument you added. Maybe the get_delta_stats() function could have the same behaviour as it used to when env_var is NULL? Only the bottom row of plots would be available in that case.

This is what I get with a continuous env_var: image