emmanuelparadis / ape

Analysis of Phylogenetics and Evolution
https://emmanuelparadis.github.io/
GNU General Public License v2.0
53 stars 11 forks source link

Weird behaviour of parameter `random` in multi2di() #23

Closed fcampelo closed 3 years ago

fcampelo commented 3 years ago

Hi,

I found this somewhat strange behaviour of multi2di(). Depending on the parameter random, the output may or may not be an ultrametric tree. I may be wrong (not really a specialist here), but my understanding was that random was essentially used to resolve multicotomies in random order vs. in the order they appeared in the tree, and should not change the nature of the tree. Is that correct?

Reproducible example below (using a character string for convenience. Also works if you're reading the tree from a file):

> my.tree <- ape::read.tree(text = "((((NC_011751:1.0,NC_017626:1.0):2.0,
(NZ_CP021202:1.0,NZ_CP025707:1.0):2.0):1.0,NZ_CP019903:4.0):75.0,
((((((((((CP019778:1.0,NZ_LT601384:1.0):1.0,NZ_CP027701:2.0):1.0,NZ_CP011134:3.0):1.0,NZ_CP025747:4.0):6.0,
((NZ_CP010171:1.0,NZ_CP020543:1.0):4.0,((NZ_CP010231:1.0,NC_017641:1.0):2.0,
(NZ_CP027105:1.0,NC_017633:1.0):2.0):2.0):5.0):2.0,(NZ_CP017220:1.0,NZ_CP020835:1.0):11.0):3.0,
((NC_009800:1.0,NZ_CP006636:1.0):1.0,NZ_CP019961:2.0):13.0):22.0,((((NC_011415:1.0,NZ_CP010242:1.0):15.0,
(((NZ_CP010230:7.0,(NZ_CP012380:6.0,(NC_009801:5.0,
(NC_011748:4.0,NC_018650:4.0,NC_018658:4.0,NC_018661:4.0,NZ_HF572917:4.0):1.0):1.0):1.0):1.0,NZ_CP010235:8.0):6.0,
((NZ_CP010236:3.0,((NC_013361:1.0,NZ_CP028112:1.0):1.0,NC_013364:2.0):1.0):2.0,
(NZ_CP016546:1.0,NC_013353:1.0):4.0):9.0):2.0):2.0,(NC_011741:1.0,NZ_CP010238:1.0):17.0):3.0,
((NC_020163:1.0,NZ_CP010315:1.0):1.0,NZ_CP024801:2.0):19.0):16.0):1.0,NZ_CP022154:38.0):8.0,
(((NC_002655:2.0,NC_002695:2.0,NC_017906:2.0):3.0,(NC_011353:2.0,NC_013008:2.0,NZ_CP008805:2.0):3.0):2.0,
(NC_013941:1.0,NC_017656:1.0):6.0):39.0):33.0,((NC_010498:2.0,(NC_011750:1.0,NC_017646:1.0):1.0):25.0,
(((((NC_011745:7.0,(NC_007946:6.0,
(NC_008563:2.0,NC_011742:2.0,NZ_CP026853:2.0):4.0,NC_017628:6.0,NC_017632:6.0,NZ_CP025401:6.0):1.0):3.0,
(NC_008253:2.0,(NC_011993:1.0,NC_017634:1.0):1.0):8.0):4.0,
(((NC_017631:1.0,NZ_CP007799:1.0):1.0,NC_004431:2.0):1.0,NZ_CP012379:3.0):11.0):1.0,NC_011601:15.0):9.0,
(NC_013654:8.0,((NZ_CP024717:6.0,NC_017644:6.0,NC_022648:6.0,
(NZ_CP010876:1.0,NZ_CP021179:1.0):5.0,NZ_CP013658:6.0,NZ_HG941718:6.0):1.0,NZ_CP021207:7.0):1.0):16.0):3.0):52.0);")

> ape::is.ultrametric(my.tree)
[1] TRUE
> ape::is.ultrametric(ape::multi2di(my.tree, random = TRUE))
[1] FALSE
> ape::is.ultrametric(ape::multi2di(my.tree, random = FALSE))
[1] TRUE

Is this something that was expected to happen?

Cheers, Felipe

emmanuelparadis commented 3 years ago

Hi Felipe,

Yes that's strange. The tree is kept ultrametric in only a small proportion of cases (random = TRUE is the default):

R> expr <- expression(is.ultrametric(multi2di(my.tree)))
R> res <- replicate(1000, eval(expr))
R> table(res)
res
FALSE  TRUE 
  962    38 

But if you change the option equiprob, the results are as expected:

R> expr <- expression(is.ultrametric(multi2di(my.tree, equiprob = FALSE)))
R> res <- replicate(1000, eval(expr))
R> table(res)
res
TRUE 
1000

You can also have always TRUE if you increase the value of the option tol in multi2di with the default equiprob = TRUE. Anyway, there should be no difference between both solutions. I'm going to have a closer look at it. Thanks for reporting this. Best, Emmanuel

fcampelo commented 3 years ago

Hi Emmanuel,

Thank you for the feedback! The first option (equiprob = FALSE works like a charm, but the second one (increasing tol) doesn't:

R> expr <- expression(is.ultrametric(multi2di(my.tree, tol = 1e-5)))
R> res <- replicate(1000, eval(expr))
R> table(res)
res
FALSE  TRUE 
  961    39 

R> expr <- expression(is.ultrametric(multi2di(my.tree, tol = 1e-3)))
R> res <- replicate(1000, eval(expr))
R> table(res)
res
FALSE  TRUE 
  978    22 

R> expr <- expression(is.ultrametric(multi2di(my.tree, tol = .1)))
R> res <- replicate(1000, eval(expr))
R> table(res)
res
FALSE  TRUE 
  969    31 

R> expr <- expression(is.ultrametric(multi2di(my.tree, tol = 1)))
R> res <- replicate(1000, eval(expr))
R> table(res)
res
FALSE  TRUE 
  967    33 
emmanuelparadis commented 3 years ago

Oops! My mistake: tol is an option of is.ultrametric, not multi2di. Coincidentally, di2multi has also this option...

KlausVigo commented 3 years ago

Hi @fcampelo & @emmanuelparadis, it is a problem with rtopology, as

> ape::is.ultrametric(ape::multi2di(my.tree, random = TRUE, equiprob=FALSE))
[1] TRUE 

which calls rtree works fine. I assume rtree used to be the default. If you replace rtopology with the following

fun <- function(N){
  x <- rtopology(N, rooted = TRUE)
  desc <- x$edge[,2]
  ind <- desc<=N
  x$edge[ind, 2] <- seq_len(N)
  x$tip.label <- x$tip.label[desc[ind]]
  x
}

inside .multi2di_ape it works. Maybe not really elegant. Cheers, Klaus

emmanuelparadis commented 3 years ago

Hi @KlausVigo,

Thanks! I've found something even less elegant:

FUN <- function(N) {
    tr <- rtopology(N, rooted = TRUE)
    tr <- read.tree(text = write.tree(tr))
    tr$edge
}

I'll test later which one's best. Cheers, Emmanuel

emmanuelparadis commented 3 years ago

I pushed a fix on the function ape::::.multi2di_ape. It has this bit of code:

    if (random) {
        if (equiprob) {
            FUN <- function(N) {
                x <- rtopology(N, rooted = TRUE)$edge
                desc <- x[, 2L]
                x[desc <= N, 2L] <- seq_len(N)
                x
            }
        } else {
            FUN <- function(N) rtree(N)$edge
        }
    }

Thus avoiding evaluating if for each node with multichotomies.