emmanuelparadis / ape

Analysis of Phylogenetics and Evolution
https://emmanuelparadis.github.io/
GNU General Public License v2.0
53 stars 11 forks source link

`read.tree` seems to round branch lengths #80

Closed ajrominger closed 1 year ago

ajrominger commented 1 year ago

Thanks so much for ape! I don't know where I'd be without it!

Recently I've been working ape::phylo objects and msprime. I am running into an issue with branch lengths seeming to be rounded to 6 decimal places by ape::read.tree. When greater than 6 decimal places are needed, is there any work-around to the default behavior of ape::read.tree?

Here's an example:

y <- read.tree(text = '(((((t6:2.45112382300675335500272922218,t8:2.45112382300675335500272922218):9.79658819007364911612967262045,t3:12.2477120130804024711324018426):0.274339041983090226040076231584,t4:12.5220510550634926971724780742):18.933448771541357302794494899,t10:31.4554998266048499999669729732):121.547048353778492923993326258,(((t5:1.18211007899886055838578613475,t1:1.18211007899886055838578613475):2.39764699703194139601691858843,(t7:0.98689255753362914447279763408,t2:0.98689255753362914447279763408):2.5928645184971728099299070891):16.3437354032475887777309253579,t9:19.9234924792783907321336300811):133.07905570110494863911299035);')

# note the branch length for tip "t6" is 2.45112382300675335500272922218
# now look at that branch in the `edge.length` element:

y$edge.length[y$edge[, 2] == which(y$tip.label == 't6')]

## 2.451124

Thanks for any thoughts on this and thanks again for ape!

TGuillerme commented 1 year ago

This seems to be just how your machine decides to display digits. I think the branch lengths are actually properly input but displayed depending on your options. You can change it using options(digits). For example:

options(digits = 20)
y$edge.length[y$edge[, 2] == which(y$tip.label == 't6')]
## [1] 2.451123823006753355
emmanuelparadis commented 1 year ago

Thanks for the appreciation (always appreciated ;) ) read.tree does not truncate digits: they are all read in the limits defined by R. First, R prints a number of digits according to this option:

R> options("digits")
$digits
[1] 7

This can be overriden by for instance print(, digits = ). Try print(y$edge.length, digits = 22). Then, real numbers in R are 64-bit floating-point numbers, so you cannot get that many digits stored in a number (16 digits is the typical number):

R> identical(2.45112382300675335500272922218, 2.4511238230067533)
[1] TRUE
R> identical(2.45112382300675335500272922218, 2.451123823006753)
[1] FALSE

The digits after '33' are truncated because they cannot be stored in a 64-bit reals. Maybe msprime uses 128-bit reals (known as long double in C/C++)?

ajrominger commented 1 year ago

Thanks so much for the prompt reply and info! This all makes a lot of sense. We're digging deeper into the msprime side of things, but now the R side is fully transparent to us, which is a huge help!

Thanks again!