Closed rueuntal closed 8 years ago
Update:
One problem is still left but everything else looks beautiful! The following figures show:
Two groups have the exact same set of parameters Two groups differing in the spread of the SAD Differing in N Differing in level of aggregation
What's been fixed:
What's left to be solved:
Here's another half-baked idea for the aggregation null model. @dmcglinn if you'd want to comment on it that would be much appreciated; but if the description is too scattered for you to understand, I'll just play with it in the next few days and see if anything pans out.
Starting from what we are hoping to achieve, we'd want the null model to have the following features:
Now think about this in the context of the plot_species matrix. Suppose we have a k_S matrix M. The first k1 rows belong to Group 1, and the next k2 rows belong to Group 2. For each column (species) i, we'd want to keep the sum of M[1:k1, i] and M[(k1+1):(k1+k2), i] cells intact (which are the total abundance of species i in Group 1 and Group 2, respectively). This means that our only option would be to shuffle the conspecific individuals among plots within each group.
Our first null model, where both groups are forced to have zero aggregation (poisson distribution of individuals among plots), was a natural choice that did fulfill the above requirement. But we've seen that it's too stringent. So (finally after the long digression) here is my new proposal:
The above steps complete one permutation for the null model. It will take me some time to code it and it may take time to run, but I think it will do what we have hoped (though the experience from yesterday suggests that my intuition is more often wrong than right...).
Hey @rueuntal thanks for posting an update to our conversations. I like the direction of your new idea. Specifically I completely agree (now) with the 3 key features the null model must have. In the context of the other null models we employ it now makes sense why we need to compare each group specific pattern with a pooled spatial aggregation preserved null. I can see why tryng to preserve the SSAD or at least the binary pattern of patchiness at a reference groups state could be a first step. I wanted to suggest a slightly different approach that is a bit more parametric which I don't like but could provide us with maybe a little more what we want without having to specify a reference group to use for the spatial patterns.
The basic idea is to the characterize the spatial pattern of within-species aggregation for the entire study site (i.e., ignore groups) by computing species specific variograms (sensu Wagner 200? in Ecology). This is just the sum of squared differences between each plot in the abundance of species i as a function of spatial lag. We fit one of the standard models to this curve (e.g., exponential) then we use the R package RandomFields to simulate random realizations of landscapes with these spatial parameters. Drop our plots down on this landscape (in the case of a grid sampling design we take them all). We'll have to set a threshold for whether a species is present or not so that each realization results in the correct number of individuals for that species. I haven't though of a good way to do this for abundances yet. Then we move on to the next species. For species that only occur in one group we still do the procedure but only using the info from that group and then only placing these new abundances back in that same group). Let me know what you think. Here is a little bit of demo code:
library(RandomFields)
library(vegan)
data(mite)
data(mite.xy)
mite_binary = as.data.frame((mite > 0) * 1)
mite_sp = SpatialPointsDataFrame(mite.xy,
mite_binary)
mod = RMexp(var=NA, scale=NA)
# let's model sp 4 as a demo
mod_fit = RFfit(mod, data=mite_sp[ , 4])
plot(mod_fit)
# let's simulate 4 random landscapes
sims = RFsimulate(mod_fit, x=coordinates(mite_sp), n=4)
plot(sims)
# this was a binary landscape so these are number of presences
sp_n = sum(mite_binary[ , 4])
sp_n
thres = apply(sims@data, 2, min)
for(i in seq_along(thres)) {
cts = sum(sims@data[ , i] > thres[i])
while(sp_n <= cts){
thres[i] = thres[i] + .05
cts = sum(sims@data[ , i] > thres[i])
}
sims@data[,i] = (sims@data[,i] > thres[i])*1
}
plot(sims)
Also I should mention that the above demo does not exactly nail the number of species presences but this could be improved by changing the increment by which the threshold is increased in my code its 0.05. For small numbers of plots though some runs of the sim will likely have to be dropped because it will be impossible to get close to the true number of presences.
also I worked up some code for handling abundances which i think is ultimately what we want. Good news this should be faster and its easier to understand. it does involve rounding which is annoying though:
sp_n = sum(mite[ , 4])
sp_n
sims_pos = apply(sims@data, 2, function(x) x - min(x))
sims@data = as.data.frame(apply(sims_pos, 2,
function(x) round(x / (sum(x) / sp_n))))
plot(sims)
@FelixMay do you have any thoughts about using these spatial simulators as a component of our aggregation null model? In some ways using a parameterized cluster model to generate the null model may be better than a random field simulation because int he cluster process discrete individuals are the things being distributed not continuous values that need to be rescaled into individuals. However, it seems that this problem of a fixed N could come up again if the sampling regime was not a grid.
@rueuntal here is a little bit of code I started working on to carry out the algo you described above. It's not much but it may save you some time if you still think this is a worthwhile avenue. I'll probably try to work on this some more in the morning, but I wanted to share it incase you're working on this now.
samp_ssad = function(comm, groups){
ords = apply(aggregate(comm, list(groups), sum)[ , -1], 1,
order, decreasing=TRUE)
group_levels = unique(groups)
comm_rank = comm
#sapply(seq_along(group_levels), function(x)
# comm[groups == group_levels[x], ords[ , x]],
# simplify='array')
for(i in seq_along(group_levels)) {
row_bool = groups == group_levels[i]
comm_rank[row_bool, ] = comm[row_bool, ords[ , i]]
}
group_sad = aggregate(comm_rank, list(groups), sum)[ , -1]
comm_perm = comm_rank
for (sp in 1:ncol(comm_rank)){
coin = ifelse(runif(1) < 0.5, 1, 2)
row_bool = groups == group_levels[coin]
if (all(group_sad[ , sp] > 0)) {
comm_perm[row_bool, sp] = sample(
}
}
return(comm_out)
}
Thanks a lot! Feel free to push on (or push on your own algorithm) though; my priority will probably be on Gordon in the next couple of days. But I'll be back at work on MOBR in August.
On Wed, Jul 20, 2016 at 11:55 PM, Dan McGlinn notifications@github.com wrote:
@rueuntal https://github.com/rueuntal here is a little bit of code I started working on to carry out the algo you described above. It's not much but it may save you some time if you still think this is a worthwhile avenue. I'll probably try to work on this some more in the morning, but I wanted to share it incase you're working on this now.
samp_ssad = function(comm, groups){ ords = apply(aggregate(comm, list(groups), sum)[ , -1], 1, order, decreasing=TRUE) group_levels = unique(groups) comm_rank = comm
sapply(seq_along(group_levels), function(x)
comm[groups == group_levels[x], ords[ , x]],
simplify='array')
for(i in seq_along(group_levels)) { row_bool = groups == group_levels[i] comm_rank[row_bool, ] = comm[row_bool, ords[ , i]] } group_sad = aggregate(comm_rank, list(groups), sum)[ , -1] comm_perm = comm_rank for (sp in 1:ncol(comm_rank)){ coin = ifelse(runif(1) < 0.5, 1, 2) row_bool = groups == group_levels[coin] if (all(group_sad[ , sp] > 0)) { comm_perm[row_bool, sp] = sample( }
} return(comm_out) }
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MoBiodiv/mobr/issues/54#issuecomment-234095922, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGoPgfKRGfMUuUmxSxNe_1XV2Z2-Jkaks5qXplEgaJpZM4JNOuN .
I think I now understand why the "current" (not exactly sure if it's still current, but it is the one being called in the main function) null model for aggregation swap_binary_species()
has the wonky behavior of not centering around zero.
@dmcglinn and I have come to the conclusion that when we look at aggregation (difference between spatial accumulation curve and swap curve), abundances don't matter any more; what matters is presence/absence. In the swap curve (created by randomly distributing conspecific individuals across plots within a group), Pr_swap(absence in a plot) is approximately Poisson(n_i = 0 | total n within group). In the spatial accumulation curve, Pr_spatial(absence) is higher (approximately negatively binomial?) than Pr_swap(absence) because of within-site aggregation. The effect of aggregation on the accumulation of species then is the difference between these two probabilities - in the loose sense, not by direct deduction.
swap_binary_species()
converts the abundances into presence/absence, and for each species swaps P/A among all plots across groups. What this does is making Pr_spatial(absence) equal among groups for each species. However, when we deduct this null spatial accumulation curve from the swap curve for each group, the difference in Pr_swap(absence) still exists, due to difference in species abundances across groups (e.g., if group 1 has 10 individuals of sp1 and group 2 has 50, of course the probability of sp1 being absent from a plot is higher in group 1 than in group 2). That's why we see the null expectation for aggregation not center around zero when there's an effect of the SAD or N.
Let me know if the above argument doesn't make sense, or if you see a solution (hopefully without eliminating the difference in abundance across groups)! In the meantime I'll keep thinking about it.
Thanks, I agree with this logic. It may not be possible to completely separate effects of aggregation from the effects of the SAD. However, the aggregation effects are constrained within certain boundaries by the shape of the SAD, so I bet there are some patterns that can only be generated by differences in the SAD. I will keep thinking about a solution.
Yea that makes sense to me! Thanks for laying it out so clearly.
Results for sensitivity test to date ( #77 ), % detected difference:
Identical communities: 19% detected differences in SAD, 0 in N Different N: 25% detected differences in SAD, 100% in N Different SAD: 94% detected differences in SAD, 0 in N Different aggregation: 21% detected differences in SAD, 0 in N
I have checked commit 74cfe7618 and the same biased results with Type I error around 25% are being returned as reported above by @rueuntal
[1] "const , detected differences in SAD, N, aggregation: 0.245 NaN NaN"
[1] "N , detected differences in SAD, N, aggregation: 0.273 NaN NaN"
[1] "SAD , detected differences in SAD, N, aggregation: 0.917666666666667 NaN NaN"
[1] "agg , detected differences in SAD, N, aggregation: 0.180333333333333 NaN NaN"
@dmcglinn - bummer. By the way why NA's for the other two tests?
I only ran the SAD test for speed.
Got it.
On Mon, Sep 12, 2016 at 6:34 PM, Dan McGlinn notifications@github.com wrote:
I only ran the SAD test for speed.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MoBiodiv/mobr/issues/54#issuecomment-246516980, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGoPvEknsNR08frwXZ97J0_l-Y_iinwks5qpdN7gaJpZM4JNOuN .
commit 82dcfa6563 (July 15) had the same biased results. Do you think I should move earlier or later?
[1] "const , detected differences in SAD, N, aggregation: 0.226333333333333 NaN NaN"
[1] "N , detected differences in SAD, N, aggregation: 0.269 NaN NaN"
[1] "SAD , detected differences in SAD, N, aggregation: 0.949333333333333 NaN NaN"
[1] "agg , detected differences in SAD, N, aggregation: 0.193 NaN NaN"
If I have to guess I'd say later, since we spent most our time tweaking the spatial one after July 15 and didn't make too much changes to the other two algorithmically.
I was thinking that maybe we got good results b/c there was a bug in my code back then which you later fixed. But now even the old code is not working... this is bizarre.
do you think that the results you posted here were from the 4cur branch? I'm looking into this possibility right now with commit 042c1e148
I think so, in master the algorithm for N is different (we were using the sample-based rarefaction curves). Though in retrospect, that shouldn't affect the algo for SAD.
On Mon, Sep 12, 2016 at 8:03 PM, Dan McGlinn notifications@github.com wrote:
do you think that the results you posted here were from the 4cur branch? I'm looking into this possibility right now with commit 042c1e1 https://github.com/MoBiodiv/mobr/commit/042c1e1489d2a1ad1a979543516577a609a7e654
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MoBiodiv/mobr/issues/54#issuecomment-246533387, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGoPtwZ04S8tBn9VNyUZDs2GYnVjkMRks5qpehOgaJpZM4JNOuN .
I'm just going to use this thread to post the results for a bunch of different commits.
For 042c1e1 (July 15, 4cur branch) the results are still biased
[1] "const , detected differences in SAD, N, aggregation: 0.245333333333333 NaN NaN"
[1] "N , detected differences in SAD, N, aggregation: 0.303333333333333 NaN NaN"
[1] "SAD , detected differences in SAD, N, aggregation: 0.935666666666667 NaN NaN"
[1] "agg , detected differences in SAD, N, aggregation: 0.207 NaN NaN"
For 5d8158bbb (July 16, 4cur branch) the results are still biased
[1] "const , detected differences in SAD, N, aggregation: 0.224666666666667 NaN NaN"
[1] "N , detected differences in SAD, N, aggregation: 0.285666666666667 NaN NaN"
[1] "SAD , detected differences in SAD, N, aggregation: 0.942 NaN NaN"
[1] "agg , detected differences in SAD, N, aggregation: 0.210333333333333 NaN NaN"
Hey @dmcglinn - thanks a lot for carrying out the tests! Bummer... I really should have kept a better record of what was changed and their outputs back then.
So the function for the SAD test looks right to you? For the rest of today I'll dig back to see if we were using the same pars for the test, as well as to test out other old algos.
I haven't looked closely at the SAD null test. I can try to spend some time on it tonight though. There are still commits to check so all hope is not lost. It is also a potential that something changed in the sensitivity side of things that code we know has been rearranged a bit since this summer.
Ok well I do have one small insight into why the current form of the SAD test may not be working. In our code we have the following starting at https://github.com/MoBiodiv/mobr/blob/4cur/R/mobr.R#L567
# Null test
overall_sad_lumped = as.numeric(colSums(group_sad))
meta_freq = SpecDist(overall_sad_lumped)$probability
...
sad_perm = sapply(rowSums(group_sad), function(x)
data.frame(table(factor(sample(1:length(meta_freq), x, replace = T,
prob = meta_freq),
levels = 1:length(meta_freq))))[, 2])
It was my understanding that the function SpecDist
was computing an Anne Chao predicted meta-community SAD (this was to avoid issues of scaling the empirical SAD), but in reality this function simply returns the ranked proportions of each species in the pooled SAD. As an example take a look at the following:
SpecDist(c(500,100,100,100,100,100))
rank method probability
1 1 detected 0.5
2 2 detected 0.1
3 3 detected 0.1
4 4 detected 0.1
5 5 detected 0.1
6 6 detected 0.1
Interestingly though this function does not just return the frequencies if the total counts are small
SpecDist(c(5,1,1,1,1,1))
rank method probability
1 1 detected 4.665767e-01
7 2 undetected 3.829763e-01
2 3 detected 3.008892e-02
3 4 detected 3.008892e-02
4 5 detected 3.008892e-02
5 6 detected 3.008892e-02
6 7 detected 3.008892e-02
8 8 undetected 2.409641e-06
9 9 undetected 1.516117e-11
10 10 undetected 9.539222e-17
11 11 undetected 6.001963e-22
12 12 undetected 3.776363e-27
13 13 undetected 2.376042e-32
14 14 undetected 1.494977e-37
15 15 undetected 9.406212e-43
In this case it estimates that the community is much richer than was observed.
Great, thanks for the insight! So SpecDist
tries to estimate the frequencies of all sp in the metacommunity including both detected and undetected. If the observed N is already high, it may claim that all sp are detected.
In that case what might be the problem could be that (I think) we were using different set of parameters for the test, with lower N. Last time I checked (rather hastily before our last meeting) that didn't seem to be the problem; but I'll look into it more closely tonight.
Tested using old parameterization: S = 100, N = 400, CV = 1, sigma = 0.2 Still biased. Null case when all parameters are the same has type I error ~0.21.
If I have to guess, I'd guess that even though we try to project to the metacommunity using SpecDist
, it's still too similar with (or largely bounded by) the two groups.
I can confirm and elaborate on the details for you on what the Chao algorithm is doing.
First, it estimates the number of undetected species in the full assemblage as f1^2/2f2, where f1 = number of singletons (species represented in the sample by exactly 1 individual) and f2 = number of doubletons (species represented in the sample by exactly 2 individuals).
Thus, if the sample has 8 species abundances with abundances of 200,100,10,2,1,1,1,1, there are an additional 4 undetected species. However, with the same relative frequencies, if the sample has 2000, 1000, 100, 20, 10, 10, 10, 10 individuals, the calculation says no undetected species.
This approach says that, if the sampling is so thorough that all species are represented by at least 2 individuals, there are no undetected species left to find. If you think about how sampling works, you will realize that it takes a huge effort to reach this (somewhere between 5x and 20x of the sampling effort we usually see in ecological studies).
As for the relative abundance, the relative abundance of these undetected species is small, and will usually, but not always, be less than the relative abundance of the rarest species.
The program also correct for the inherent bias in the estimate of relative abundances for species that were detected. For common species, their relative abundance changes little, if at all. For rare species, the program shifts their relative abundances downward, making them more rare than they would appear from the sample. The corresponding probability mass is then shifted to the undetected species.
So, to get back to the behavior of your algorithm. If the sample data have no singletons or doubletons, the Chao correction does not do anything. If there are singletons and doubletons (or just singletons), the program will estimate the number of undetected species, and then re-adjust the relative abundance of both detected and undetected species.
That is basically what you have both been inferring, but I hope this confirmation and detail helps!
..oops, didn't mean to close the comment!
@ngotelli - thanks for the detailed explanation! That does explain why sometimes the inferred frequencies are exactly the same as the observed.
@dmcglinn and I talked briefly over skype this morning, and we decided that the first thing to do is to conduct a sanity check - instead of using SpecDist
to infer the frequencies, let's feed the known frequencies used in the simulation to the null model. In this way the null samples should be generated using exactly the same parameters as the test samples, and the type one error should by definition be around 5%. Does that make sense?
Here is the result. Using a reference group with parameters S = 100, N = 400, CV = 2, sigma = 0.2, the type one error for both identical parameters and changing N is about 33%.
So it does seem that there must be a bug somewhere. (Paradoxically) what a relief! I'll carefully read over our sensitivity analysis to see if I can find what's going on.
By the way, this is what I changed in mobr.R:
meta_freq = SpecDist(comp_sad_lumped)$probability
S.pool = 100
mean.abund = 100
cv.abund = 2
sd.abund <- mean.abund*cv.abund
sigma1 <- sqrt(log(sd.abund^2/mean.abund^2 +1))
mu1 <- log(mean.abund) - sigma1^2/2
abund.pool <- rlnorm(S.pool, meanlog=mu1, sdlog=sigma1)
meta_freq <- sort(abund.pool/sum(abund.pool), decreasing = T)
Found the problem! :) It was not a bug in our code, but how we implemented @FelixMay 's MoBspatial
simulations. Specifically, when we generate SADs for more than one groups with same parameters for the underlying lognormal distribution, we want the frequencies of species in the metacommunity to be exactly the same, not just two random samples from the same distribution. @ngotelli identified this problem and I got it fixed during the working group, but it probably got reintroduced when I pulled MoBspatial
from Git. Given that we don't want to make those changes to MoBspatial
(it makes sense for our purpose but not for general use), we'd probably want to write independent test code at some point.
Type I error is back to being perfect now for SAD test: 0.055 for equal parameters, 0.047 for changing N, 0.959 for changing SAD.
I will now run the full sensitivity analysis with the other components added back in.
Bravo!
what great news! way to go @rueuntal!!
Ref group: S = 100, N = 1000, cv = 2, sigma = 0.2 Constant: N = 1001 Changing N: N = 400 Changing SAD: cv = 1 Changing aggregation: sigma = 1 100 runs for each comparison.
[1] "const , detected differences in SAD, N, aggregation: 0.0353333333333333 0 0.106875"
[1] "N , detected differences in SAD, N, aggregation: 0.0326666666666667 1 0.128125"
[1] "SAD , detected differences in SAD, N, aggregation: 0.960666666666667 0 0.1275"
[1] "agg , detected differences in SAD, N, aggregation: 0.041 0 0.34125"
So, test for SAD is great. Test for N may be over-sensitive - it's either 0 or 1. Would that be a problem? Test for aggregation doesn't look good (just as before). We probably have to admit that currently we can't say anything definitive about it?
Correction: we can say with confidence what's the magnitude of the effect of aggregation, but we can't properly test our claim about its significance.
Thanks for these results they look great! I don't think it is a problem that the N test is so sensitive, that seems like a good thing but it is a little odd - I wonder if for the paper we should change the number of runs to 1000 to try to get a bit more resolution for the N-effect. There may also be interactions in our sensitivity analysis with the parameter space that you are testing so it will be important for us to define what are some reasonable ranges of values to examine in S, N, and degree of agg.
I agree - for our paper we'd want to do the test not on a single value but for a series of cases, as well as cases where two out of three factors change simultaneously. Also the change of N from 1000 to 400 may be too drastic - the effect didn't show up when it was 1000 vs 1001, thank goodness.
Looks like we are getting close (for real this time)! I'm blocking today and tomorrow for some other tasks, but can start to work on cleaning up the code (remove functions no longer needed, tweak plotting, etc. ) on Saturday. @dmcglinn do you think we are ready to merge 4cur
back to master
?
Ok sounds good. Merging 4cur
back to master
sounds good. Then it will be a matter of working on the docs and asking @FelixMay to help generate a tutorial in markdown. Do you think the sensitivity analysis code should be part of the project? On the one hand I can see it as being useful for running code tests, we may want to think about adding continuous integration into this project (I've never done this but I've heard its not too bad).
My specific question with the sensitivity code is if it should be part of the R package (of course it is going to be a part of the project).
Will it then serve the purpose of showing the users the credibility of the specific parameterization that they are working with?
I wasn't thinking about it that way so much because I don't see many users diving into the weeds to understand how to use or interpret the sensitivity results. Instead I was thinking they may be really useful for us when pushing new updates to the package. If the sensitivity results break (i.e., type I error goes way up by more than a reasonably defined amount) then it's a sign that we introduced a bug into the code.
Makes sense. This question probably reflects my lack of understanding of how R packages work - what are the pros and cons of having the sensitivity analysis in the package, versus hosting it here on GitHub for future reference?
Hi Dan hi Xiao, Thanks for all your work and it is great to see the progress. I know that there is a special way to implement tests for code in R packages, but I have never used it so far. Here is a description for that:
http://r-pkgs.had.co.nz/tests.html
Maybe that is useful to include the sensitivity analysis in the package? Let me know when the code is ready to create a vignette with a tutorial. Do we already have a good example data set for this purpose?
Von: Dan McGlinn [mailto:notifications@github.com] Gesendet: Donnerstag, 22. September 2016 20:25 An: MoBiodiv/mobr mobr@noreply.github.com Cc: May, Felix felix.may@idiv.de; Mention mention@noreply.github.com Betreff: Re: [MoBiodiv/mobr] Need to think carefully about what the null models are (#54)
Ok sounds good. Merging 4cur back to master sounds good. Then it will be a matter of working on the docs and asking @FelixMayhttps://github.com/FelixMay to help generate a tutorial in markdown. Do you think the sensitivity analysis code should be part of the project? On the one hand I can see it as being useful for running code tests, we may want to think about adding continuous integration into this project (I've never done this but I've heard its not too bad).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/MoBiodiv/mobr/issues/54#issuecomment-248986347, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AO8i0ZJlNjJw_jzAd1_C8Q6df3przfCLks5qssfygaJpZM4JNOuN.
Thanks @FelixMay ! That's what I was also thinking of was placing the sensitivity analysis code in the test folder. I've used the tests a little bit and their relatively straightforward to implement. The one question will be how to best build the dependency on MoBspatial
because based on @rueuntal 's description it sounds like sampling from the SAD needs to be changed for our purposes but should likely not be changed generally in the MoBspatial
package. I suppose one option would be if we could just build in the type of sampling from the SAD (random or fixed) as an argument to MoBspatial
's simulation functions.
Thanks @FelixMay & @dmcglinn ! Including test in the package sounds pretty cool. Shall we give it a try then?
As for MoBspatial
, I think we can remove its dependency from the test code for now, because we essentially have no way to tell how good our test for aggregation is based on the simulation. Since our tests are hierarchical (ie when we test for SAD or N the effect of aggregation is already removed), I can re-write the test so that there is no aggregation to start with. And in our test, we'd also only test our ability to find effect of SAD and/or N. How does that sound?
Hi Xiao, Sounds fine with me. If I can help with any coding that does not require deeper knowledge of the mobr code let me know. Of course feel free to use any source code from MoBspatial and directly include it in the test if required.
Felix
@rueuntal your suggestion does sound reasonable but the aggregation testing isn't completely useless is it? The results suggest that it has type I error 10% of the time and type II error 70% of the time which makes me think that it is actually a pretty conservative statistic. In other words I feel like we do gain some information from going through the aggregation sensitivity exercise despite that we are cannot be completely confident in our method at the end of the day. Can you remind me why you and Brian decided that its ok that our spatial null model doesn't work. I think I remember its because 1) our approach of complete spatial randomness is a simple intuitive solution and 2) to carry out a proper test one would need to simulate multi-scale patterns of aggregation rather than choosing an anchor scale and distributing clusters after that. I guess my question is can you remind me why choosing an anchor scale is a bad thing - I know that all spatial simulation approaches we currently have share that limitation.
@dmcglinn you are right about the scale-dependence. Brian's impression (and I agree) is that there is no such thing as "the same level of aggregation", unless two groups are exactly the same (same N, same SAD, same spatial distribution for each species) at every scale. Since our approach looks at aggregation at multiple scales, it is almost destined to find differences even when the groups are simulated with the same aggregation parameter (which is what we saw in the first 3 tests).
I wouldn't say our test is conservative either, as it only detected 34% of the difference when the parameters did differ - which of course could also have resulted from parameters being relatively close; @FelixMay probably has more insights for this (would sigma = 0.2 and sigma = 1 be different enough?).
So my feeling is that our test could be right, because it makes intuitive sense (though I think Brian believes this more than I do); but there's no way for us to tell if that's the case. What do you think?
Hi Xiao, What sigma means depends on the total size of the system, which is by default 1 x 1 unit. In this case sigma 0.2 is rather weak clumping, because single clusters can easily span the entire landscape and sigma = 1 is not really clumped anymore, because the average distance between mother and offspring is equal to the landscape extent.
How do you guys get the tags with @username in these messages? Do you type them or does GitHub do this automatically?
Felix
Von: Xiao Xiao [mailto:notifications@github.com] Gesendet: Montag, 26. September 2016 16:15 An: MoBiodiv/mobr mobr@noreply.github.com Cc: May, Felix felix.may@idiv.de; Mention mention@noreply.github.com Betreff: Re: [MoBiodiv/mobr] Need to think carefully about what the null models are (#54)
@dmcglinnhttps://github.com/dmcglinn you are right about the scale-dependence. Brian's impression (and I agree) is that there is no such thing as "the same level of aggregation", unless two groups are exactly the same (same N, same SAD, same spatial distribution for each species) at every scale. Since our approach looks at aggregation at multiple scales, it is almost destined to find differences even when the groups are simulated with the same aggregation parameter (which is what we saw in the first 3 tests).
I wouldn't say our test is conservative either, as it only detected 34% of the difference when the parameters did differ - which of course could also have resulted from parameters being relatively close; @FelixMayhttps://github.com/FelixMay probably has more insights for this (would sigma = 0.2 and sigma = 1 be different enough?).
So my feeling is that our test could be right, because it makes intuitive sense (though I think Brian believes this more than I do); but there's no way for us to tell if that's the case. What do you think?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/MoBiodiv/mobr/issues/54#issuecomment-249581493, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AO8i0YOj8gb04xQu_cMgDZVLQPuZTpnfks5qt9NRgaJpZM4JNOuN.
Hi @FelixMay - thanks for the insight! I will run the simulations with more extreme parameters then. Would you say sigma = 0.05 vs sigma = 1 are enough contrast?
I'm not sure if the tag works through email, but if you comment through GitHub, @username works just as on Twitter (and GitHub would auto-complete for you). :)
For the new 3-curve (or 4-curve) approach.