emmanuelparadis / ape

Analysis of Phylogenetics and Evolution
https://emmanuelparadis.github.io/
GNU General Public License v2.0
53 stars 11 forks source link

Ladderize results C stack usage overflow #54

Closed teng-gao closed 2 years ago

teng-gao commented 2 years ago

Hi Emmanuel,

We found that for certain tree structures, ladderize function results in C stack overflow. For example,

tree.rds.zip

> tree = readRDS('tree.rds')
> ape::ladderize(tree)
Error: C stack usage  7971204 is too close to the limit

This seems to result from the recursive foo function to reorder edges. Do you think there is an easy fix to this? Would really appreciate your help.

Thanks, Teng

evanbiederstedt commented 2 years ago

I could make a pull request to fix this behavior, ~if I could only get an example whereby the C stack overflow bug happens.~

EDIT: Sorry, I misread Teng's comment above. I'll check this now.

The function is here: https://github.com/emmanuelparadis/ape/blob/master/R/ladderize.R#L12-L27

We could maybe fix this pretty quickly with a trampoline, using this package: https://github.com/rdinnager/trampoline

Something like the following at https://github.com/emmanuelparadis/ape/blob/master/R/ladderize.R#L36 after neworder <- integer(nb.edge):

trampoline(foo(nb.tip + 1, nb.edge, 1), foo = function(node, END, where) {
    start <- which(phy$edge[, 1] == node)
    end <- c(start[-1] - 1, END)
    size <- end - start + 1
    desc <- phy$edge[start, 2]
    Nclade <- length(desc)
    n <- N[desc]
    o <- order(n, decreasing = right)
    newpos <- c(0, cumsum(size[o][-Nclade])) + where
    desc <- desc[o]
    end <- end[o]
    start <- start[o]
    neworder[newpos] <<- start
    for (i in 1:Nclade)
        if (desc[i] > nb.tip) foo(desc[i], end[i], newpos[i] + 1)
})

UPDATE: It might be useful to have another look at the logic here @KlausVigo @emmeanuelparadis

I've found the trampoline idea to be not as straightforward as I hoped. We should probably try to rewrite the above using iteration...

emmanuelparadis commented 2 years ago

Hi @teng-gao, I plotted your tree with the option show.tip.label = FALSE and it took me a few seconds to make sense of the plot! Your tree is (almost) totally unbalanced so that ladderize() has to call the recursive internal function all along it. Similarly (and unsurprisingly), balance(tree) gives the same error. I tried to fix different parameters inside or outside R but this didn't help. See here for a discussion on stacks in R with some useful info. This could be related to issue #53: maybe Lenore's tree is much more imbalanced than a random tree (write.tree calls some recursive functions as well). I don't see a simple solution right now, except rewriting the code avoiding recursive calls as suggested by @evanbiederstedt. Emmanuel

KlausVigo commented 2 years ago

Hi all, there is a fix in the pull request I just made.

PS: @teng-gao the tree you send contains ~2x too many edge.length's.

teng-gao commented 2 years ago

Just tested the iterative version by @KlausVigo and it works on my case!

evanbiederstedt commented 2 years ago

Thanks @KlausVigo

I commented on the PR. Works for me as well---I wouldn't have been able to so quickly solve this.

There's a warning but that's probably ok.

Thanks, Evan

emmanuelparadis commented 2 years ago

Excellent! I've merged Klaus's PR. I'm now working on removing the recursive call from write.tree().