Closed Rekyt closed 5 years ago
Another solution would be to let the user compute the immigration probability themselves and then provide them to coalesc()
I have bypassed the initial checks of coalesc when it is called from coalesc_abc, because they are useless in this case. It may allow (a little) quicker computation. The other suggested options will be also considered.
I have updated the code with a different way to calculate immigration probabilities with environnemental filtering.
The following code shows substantial gain of calculation time (about divided by 6 for a single trait case, and divided by more than 100 with two traits)
ind_pool_traits_tr1 <- data.frame(tr=runif(100000))
# Two traits
ind_pool_traits_tr2 <- data.frame(tr1=runif(100000), tr2=runif(100000))
topt <- 0.5
sigmaopt <- 0.1
add <- F
var.add <- NULL
env_filter1 <- ifelse(!is.null(filt), ifelse(!add,
function(x) unlist(sapply(1:nrow(x), function(i) filt(x[i,]))),
function(x, var.add) unlist(sapply(1:nrow(x), function(i) filt(x[i,], var.add)))),
function(x) unlist(sapply(1:nrow(x), function(i) 1)))
env_filter2 <- ifelse(!is.null(filt), ifelse(!add,
function(x) apply(x, 1, filt),
function(x, var.add) apply(x, function(i) filt(i, var.add))),
function(x) rep(1, nrow(x)))
##################################################################
# Single trait
filt <- function(x) exp(-(x-topt)^2/(2*sigmaopt)^2)
system.time(prob0 <- sapply(ind_pool_traits_tr1, filt))
#utilisateur système écoulé
#0.02 0 0.01
system.time(prob1 <- env_filter1(ind_pool_traits_tr1))
#utilisateur système écoulé
#1.92 0.02 1.96
system.time(prob2 <- env_filter2(ind_pool_traits_tr1))
#utilisateur système écoulé
#0.39 0.00 0.39
# Should be FALSE
any(prob1!=prob2)
any(prob0!=prob1)
##################################################################
# Two traits
filt <- function(x) exp(-(x[1]-topt)^2/(2*sigmaopt)^2)*exp(-(x[2]-topt)^2/(2*sigmaopt)^2)
system.time(prob0 <- apply(ind_pool_traits_tr2, 1, filt))
#utilisateur système écoulé
#0.56 0.00 0.56
system.time(prob1 <- env_filter1(ind_pool_traits_tr2))
#utilisateur système écoulé
#70.06 14.05 86.42
system.time(prob2 <- env_filter2(ind_pool_traits_tr2))
#utilisateur système écoulé
#0.60 0.02 0.61
# Should be FALSE
any(prob1!=prob2)
any(prob0!=prob1)```
I've been simulating 20 communities with a Gaussian environmental filter of 500 individuals from a pool containing 100000 individuals and it generally took ~40minutes. I've found out that the bottleneck was when
coalesc()
applies the environmental filter functionenv_filter()
to all the pool (BTW we could maybe tidy this a bit to be more readable) https://github.com/frmunoz/ecolottery/blob/1152a8937c20bc996bcf6fc87b436f6f7761797c/pkg/R/coalesc.R#L155-L163 As my environmental filter function is very simple it could be vectorized and I wonder to what extent we could use this to quicken the probability computation (see https://stackoverflow.com/questions/13978918/apply-is-slow-how-to-make-it-faster-or-what-are-my-alternatives for example betweenapply()
and vectorized version of function).It may be interesting to 1) put somewhere notes for the user on performances of
ecolottery
in these situations (species pool size, etc.), 2) Tell the user to provide vectorized function asenv_filter()
functions (and add avectorizable = TRUE/FALSE
argument in the function?)