frmunoz / ecolottery

R package ecolottery
GNU General Public License v3.0
15 stars 4 forks source link

Faster simulation by vectorising environmental filter #6

Closed Rekyt closed 5 years ago

Rekyt commented 5 years ago

I've been simulating 20 communities with a Gaussian environmental filter of 500 individuals from a pool containing 100000 individuals and it generally took ~40minutes. I've found out that the bottleneck was when coalesc() applies the environmental filter function env_filter() to all the pool (BTW we could maybe tidy this a bit to be more readable) https://github.com/frmunoz/ecolottery/blob/1152a8937c20bc996bcf6fc87b436f6f7761797c/pkg/R/coalesc.R#L155-L163 As my environmental filter function is very simple it could be vectorized and I wonder to what extent we could use this to quicken the probability computation (see https://stackoverflow.com/questions/13978918/apply-is-slow-how-to-make-it-faster-or-what-are-my-alternatives for example between apply() and vectorized version of function).

It may be interesting to 1) put somewhere notes for the user on performances of ecolottery in these situations (species pool size, etc.), 2) Tell the user to provide vectorized function as env_filter() functions (and add a vectorizable = TRUE/FALSE argument in the function?)

Rekyt commented 5 years ago

Another solution would be to let the user compute the immigration probability themselves and then provide them to coalesc()

frmunoz commented 5 years ago

I have bypassed the initial checks of coalesc when it is called from coalesc_abc, because they are useless in this case. It may allow (a little) quicker computation. The other suggested options will be also considered.

frmunoz commented 5 years ago

I have updated the code with a different way to calculate immigration probabilities with environnemental filtering.

The following code shows substantial gain of calculation time (about divided by 6 for a single trait case, and divided by more than 100 with two traits)


ind_pool_traits_tr1 <- data.frame(tr=runif(100000))
# Two traits
ind_pool_traits_tr2 <- data.frame(tr1=runif(100000), tr2=runif(100000))

topt <- 0.5
sigmaopt <- 0.1

add <- F
var.add <- NULL

env_filter1 <- ifelse(!is.null(filt), ifelse(!add, 
                                            function(x) unlist(sapply(1:nrow(x), function(i) filt(x[i,]))), 
                                            function(x, var.add) unlist(sapply(1:nrow(x), function(i) filt(x[i,], var.add)))), 
                     function(x) unlist(sapply(1:nrow(x), function(i) 1))) 

env_filter2 <- ifelse(!is.null(filt), ifelse(!add, 
                                            function(x) apply(x, 1, filt), 
                                            function(x, var.add) apply(x, function(i) filt(i, var.add))), 
                     function(x) rep(1, nrow(x))) 

##################################################################
# Single trait

filt <- function(x) exp(-(x-topt)^2/(2*sigmaopt)^2)

system.time(prob0 <- sapply(ind_pool_traits_tr1, filt))
#utilisateur     système      écoulé 
#0.02           0           0.01 

system.time(prob1 <- env_filter1(ind_pool_traits_tr1))
#utilisateur     système      écoulé 
#1.92        0.02        1.96

system.time(prob2 <- env_filter2(ind_pool_traits_tr1))
#utilisateur     système      écoulé 
#0.39         0.00      0.39

# Should be FALSE
any(prob1!=prob2)
any(prob0!=prob1)

##################################################################
# Two traits

filt <- function(x) exp(-(x[1]-topt)^2/(2*sigmaopt)^2)*exp(-(x[2]-topt)^2/(2*sigmaopt)^2)

system.time(prob0 <- apply(ind_pool_traits_tr2, 1, filt))
#utilisateur     système      écoulé 
#0.56           0.00           0.56 

system.time(prob1 <- env_filter1(ind_pool_traits_tr2))
#utilisateur     système      écoulé 
#70.06       14.05       86.42 

system.time(prob2 <- env_filter2(ind_pool_traits_tr2))
#utilisateur     système      écoulé 
#0.60        0.02        0.61 

# Should be FALSE
any(prob1!=prob2)
any(prob0!=prob1)```