Open pedrosenna opened 1 month ago
Dear Pedro,
Thanks for your appreciation (always appreciated!)
I run the following code (using sapply
and unlist
to avoid warnings) a couple of minutes and it run fine even with 100 tips (though it's a bit slow...):
i <- 0
repeat {
i <- i + 1
trees <- replicate(1000, rcoal(10), FALSE)
class(trees) <- "multiPhylo"
stopifnot(all(is.ultrametric(trees)))
stopifnot(all(sapply(trees, is.ultrametric)))
stopifnot(all(unlist(purrr::map(trees, is.ultrametric))))
}
Just in case: you can change the argument option
of is.ultrametric()
. From ?is.ultrametric
:
The test is based on the distances from each tip to the root and a criterion: if ‘option = 1’, the criterion is the scaled range ((max - min/max)), if ‘option = 2’, the variance is used (this was the method used until ape 3.5). The default criterion is invariant to linear changes of the branch lengths.
Best,
Emmanuel
Dear Emmanuel,
Thanks for the quick response! After running the same code you provided on my notebook, i wondered if that could be somehow related to how i am importing the tree files into R. I am importing BEAST2 trees using ape::read.nexus()
. I did a test converting the BEAST2 files into newick format using FigTree
and purrr:map()
worked as intended.
Here's a reproducible example using your code.
trees <- replicate(1000, rcoal(10), FALSE)
class(trees) <- "multiPhylo"
write.nexus(trees, file = "data/test/random_trees.nex")
trees2 <- read.nexus(file= "data/test/random_trees.nex")
identical(trees, trees2) # FALSE
all(is.ultrametric(trees2)) # TRUE
all(sapply(trees2, is.ultrametric)) # TRUE
all(unlist(purrr::map(trees2, is.ultrametric))) # FALSE
I wonder why trees
and trees2
fails to pass through identical()
testing. File sizes also differ a lot (2.3 MB vs 1.5 MB).
Dear Emmanuel,
here's another example of the same behavior after saving trees in nexus format and then reading back to R.
library(ape)
library(tidyverse)
set.seed(123)
trees <- replicate(1000, rcoal(10), FALSE)
write.nexus(trees, file = "random_trees.nex")
write.tree(trees, file = "random_trees.nwk")
trees_nex <- read.nexus("random_trees.nex")
trees_nwk <- read.tree("random_trees.nwk")
all(unlist(map(trees_nex, is.ultrametric))) # FALSE
all(unlist(map(trees_nwk, is.ultrametric))) # TRUE
Also, when comparing different str()
levels between trees, i found this:
identical(trees_nex, trees_nwk)
[1] FALSE
identical(trees_nex[1], trees_nwk[1])
[1] FALSE
identical(trees_nex[[1]], trees_nwk[[1]])
[1] TRUE
str(trees_nex[1])
Class "multiPhylo" List of 1 $ :List of 3 ..$ edge : int [1:18, 1:2] 11 12 12 11 13 13 14 15 15 16 ... ..$ edge.length: num [1:18] 2.827 0.119 0.119 2.726 0.22 ... ..$ Nnode : int 9 ..- attr(, "class")= chr "phylo" ..- attr(, "order")= chr "cladewise"
- attr(*, "TipLabel")= chr [1:10] "t10" "t7" "t1" "t8" ...
str(trees_nwk[1])
Class "multiPhylo" List of 1 $ :List of 4 ..$ edge : int [1:18, 1:2] 11 12 12 11 13 13 14 15 15 16 ... ..$ edge.length: num [1:18] 2.827 0.119 0.119 2.726 0.22 ... ..$ Nnode : int 9 ..$ tip.label : chr [1:10] "t10" "t7" "t1" "t8" ... ..- attr(, "class")= chr "phylo" ..- attr(, "order")= chr "cladewise"
Edit: Just found a workaround checking tracerer::parse_beast_trees.R
here
# Use c to strip the state names and convert it to a pure multiPhylo
trees_nex <- c(trees_nex)
all(unlist(map(trees_nex, is.ultrametric)))
[1] TRUE
Dear Pedro,
I think there are several issues behind these results.
1) The order of the elements in "phylo" objects is not mandatory, so you cannot always rely on identical()
for comparing trees: use all.equal()
instead (see ?all.equal.phylo
for details). Of course, if identical()
returns TRUE, so does all.equal()
.
2) When printing numerical (ie, real, not integers) values in a file, there is (often) the issue of how many digits to print which may "distort" the values, eg, printing 16 digits is OK but not 10 digits:
R> as.numeric(sprintf("%.16f", pi)) == pi
[1] TRUE
R> as.numeric(sprintf("%.10f", pi)) == pi
[1] FALSE
write.tree()
has the option digits = 10
(for the branch lengths) which can be modified. However, write.nexus()
has no such option and 10 digits are printed (unless you modify the code yourself).
If you just need to save trees to use them later in R/ape, my recommendation is to use saveRDS()
: this is faster, the file is compressed, and there is no loss in numerical precision for the branch lengths.
3) Because assessing ultrametricity is a statistical question, you may have encountered something like the followings:
R> tr <- rcoal(10)
R> is.ultrametric(tr, options = 1)
[1] TRUE
R> is.ultrametric(tr, options = 2)
[1] TRUE
R> ts <- read.tree(text = write.tree(tr, digits = 6))
R> is.ultrametric(ts, option = 1)
[1] FALSE
R> is.ultrametric(ts, option = 2)
[1] TRUE
Best,
Emmanuel
Dear Emmanuel,
first of all, thanks for this amazing package.
I just found an unexpected behavior when i was trying to run
splits::gmyc()
over a set of post-burnin files from BEAST2 usingpurrr::map()
, which is the tidyverse'slapply()
function.The problem seems to be in the
is.ultrametric()
evaluation made prior to the analysis. For example, running over a set of 1000 ultrametric trees generated using BEAST2:Evaluating a multiPhylo object using
is.ultrametric()
:[1] TRUE
Evaluating a multiPhylo object using
is.ultrametric()
insidelapply()
:[1] TRUE
Evaluating a multiPhylo object using
is.ultrametric()
insidepurrr::map()
:[1] FALSE
I really don't understand what's going on here. Any ideas?