ericgoolsby / Rphylopars

Phylogenetic Comparative Tools for Missing Data and Within-Species Variation
28 stars 11 forks source link

Unexpected behaviour in simtraits function #46

Closed qingchenl closed 2 years ago

qingchenl commented 2 years ago

The problem

The simtraits function will always shift add a 1 vector to all node/tip states when running under BM model, and this behaviour doesn't seem to have been noted in the R documentation. This is most apparent when the entries of the covariance matrix and the branch lengths are relatively small; sample code using a modified version of an example from the wiki, below.

Reproducing the problem

library(Rphylopars)
library(phytools) # for simulating pure-birth phylogenies
set.seed(21) # Set the seed for reproducible results

trait_cov <- matrix(c(4,2,2.2,2,3,1.5,2.2,1.5,2.5),nrow = 3,ncol = 3) * 10^-6
trait_cov # Phylogenetic trait covariance

tree <- pbtree(n=20)
sim_data <- simtraits(ntraits = 3, nmissing = 10, tree = tree, v = trait_cov, anc = rep(0, 3), model = "BM")
p1 <- phylopars(sim_data$trait_data, tree = tree, model = "BM")
p1$anc_recon[21, ] #this is the root of the tree

Note how the recovered ancestral state is essentially a 1 vector, even though the anc argument provided to the function was a 0 vector. I believe the constant component to movement also makes the model not Brownian motion in the strict sense.

Cause of the behaviour

This issue arises from the geiger::sim.char function. It has a root argument, which is added to each entry in the output of the simulate function it defines internally. The relevant line is:

simulate <- function(v, root) (m %*% as.matrix(v)) +
            root

Since geiger::sim.char is called from simtraits without providing a root argument, it defaults to 1, hence adding 1 to the position of each node/tip.

Possible solutions

One possible solution is to specify a root of 0 when geiger::sim.char is called.

Xall <- sim.char(phy = tree, par = v, nsim = nsim, root = 0)

Alternatively, root can be added as an additional optional argument for the simtraits function with default value of 1. This has the benefit of retaining previous behaviour by default.

function (ntaxa = 15, ntraits = 4, nreps = 1, nmissing = 0, tree, 
    v, anc, intraspecific, model = "BM", parameters, nsim = 1, 
    root = 1, return.type = "data.frame") 

and when calling sim.char:

Xall <- sim.char(phy = tree, par = v, nsim = nsim, root = root)

A third option would be to make no change to the code, but document the constant component. It would be relatively easy to adjust for if users are aware of this behaviour.

ericgoolsby commented 2 years ago

Oh yikes. Thanks for bringing that to my attention. I didn't realize the default root state for sim.char would be non-zero. Pushing a patch to fix this now. Thanks again