Open cysouw opened 7 years ago
That is indeed a bit slow :-( Will look into it at some point, but no promises on when. Very open for pull requests though!
Apparently, recursive programming is not ideal inside R :-(, e.g. http://dirk.eddelbuettel.com/blog/2011/09/08/#rcpp_for_recursion
So, there should be some workaround possible ("memoization") but that would sort-of destroy the whole approach of the data.tree package. So, pass everything to C?
As we are talking a few hundred nodes, recursion is not the problem. Rather, the current DefaultPlotHeight
implementation is recursive, and on top it's called recursively by ToNewick. This, of course, is highly sub-optimal.
If you don't need the height in your Newick, the simplest solution is to avoid it, like so:
n <- ToNewick(bf, heightAttribute = NULL)
If you need the plot height, a workaround is to cache the plot height attribute on the nodes, like so:
library(ape)
library(data.tree)
data(bird.families)
bf <- as.Node(bird.families)
SetPlotHeight <- function(node, rootHeight = 100) {
#traverse from leaves towards root to calculate the height and store it in height2
#Note: cannot call it height, as that already exists on `Node`
Set(node$leaves, height2 = 1)
node$Do(function(x) x$height2 <- Aggregate(x, "height2", max) + 1, traversal = "post-order", filterFun = isNotLeaf)
#traverse from root to leaves to calculate the plotHeight
#(where plotHeight uses the semantics of dendrogram/phylo, where leaves should have height 0 and root height = rootHeight. This meaning is not the same as in the data.tree package)
node$plotHeight <- rootHeight
node$Do(function(x) x$plotHeight <- x$parent$plotHeight * (1 - 1 / x$height2), filterFun = isNotRoot)
}
SetPlotHeight(bf)
Then, instead of providing a heightAttribute function, provide the name of the cached attribute:
n <- ToNewick(bf, heightAttribute = "plotHeight")
I need to figure out whether there is a clean way to fix this for good in the DefaultPlotHeight
function, thought I doubt it. At least it should be documented in the ToNewick
, as.dendrogram.Node
, and as.phylo.Node
This is exactly what I was looking for!! This cut down the time to convert a tree with 5971 tips and 1631 internal nodes, from several(!!) minutes to just 2 secs!:
If you don't need the height in your Newick, the simplest solution is to avoid it
Perhaps this is the cause of slowdowns in other conversions like ToDataFrameTable
as well? cf. this gist. I'm wondering about the slowness in conversion (of seconds or more) even after pruning to a depth of 2-6. Whereas just almost any other operation like leafCount
for example is almost instant for the whole tree.
For context, I'm trying to convert a json file via data.tree
to tidytree
format to do some visualizations. Any other alternative would also be very helpful.
In the long term, I wonder how hard it would to implement some parts in rcpp? I'm hoping this can be my go to package for hierarchical data. There really is a lack of tools in R for stuff like json. I'd be willing to spend some time if needed to work on this. Since I really need a way to work with and visualize large json structures, any suggestions for alternatives is fine too..
Is there a way to speed up this conversion, specifically with
ToNewick
?