bomeara / treevo

1 stars 1 forks source link

Implement Lexical Scoping for models (intrinsic and extrinsic) #21

Open dwbapst opened 6 years ago

dwbapst commented 6 years ago

Simulation efficiency would be improved if the intrinsic and extrinsic model functions, which simply take parameters, a traits value and the time from present (which is only used by functions that require that information), were instead function factories: functions that take a set of parameters, and return a function that only takes the values of the trait to be modified.

Here's a short example of lexical scoping:

> # lexical scoping example
> funA<-function(a,b,c){
+ z<-sum(a,b,c)
+ newFun<-function(x){
+ x*z
+ }
+ return(newFun)
+ }
> 
> funB<-funA(a=1,b=2,c=3)
> funB(1)
[1] 6
> funB(2)
[1] 12
> 
> funB<-funA(a=2,b=3,c=4)
> funB(1)
[1] 9
> funB(2)
[1] 18

The reason for this concern is my current attempt to implement the FPK model of Boucher et al. The model requires some linear algebra to calculate the directional evolution of lineages, which only requires the parameters determining the shape of the evolutionary potential surface. It would be much speedier to do those calculations once, before calculating divergences of all taxa under the intrinsic model.

I cannot be certain that lexical scoping would be beneficial for the typical, simple models employed at the moment in TreEvo. Personally, I would expect yes, having studied the efficiency of the simple BM model alone, and knowing much of the computation time is lost simply doing the minor functions within the model. Previously, I had considered doing similar things, such as drawing many indep. & identically-distributed samples from a normal distribution for the BM model, and then simply pulling from that sample rather than needing to call rnorm an excessive number of times. That would have required a broad rewrite of core function code; implementing lexical scoping would be an easier route.

Its possible actually to make the above idea work in lexical scoping as well - sampling from a distribution many times, then simply drawing values from that sample one at a time, while using a counter in the outer functions environment so that values don't get re-used. Here's an example.

> # lexical scoping example
> funA<-function(a,b){
+ z<-rnorm(10,a,b)
+ counter<-0
+ newFun<-function(x){
+ counter<<-counter+1
+ x*z[counter]
+ }
+ return(newFun)
+ }
> 
> funB<-funA(a=1,b=2)
> funB(1)
[1] 0.6472823
> funB(1) # different!
[1] 1.368759
> funB(2)
[1] -2.381404
> 
> funB<-funA(a=2,b=3)
> funB(1)
[1] 1.177277
> funB(1) # different
[1] 3.985
> funB(2)
[1] 4.974834

The short of it is though that if we want to apply FPK, I think this will be necessary.

The downside is that forcing users to write lexical scoping functions decreases the ability of users to write their own functions, as these function-factories are a slightly more advanced form of functional programming that many irregular R users may not be familiar with.