adeverse / ade4

Analysis of Ecological Data : Exploratory and Euclidean Methods in Environmental Sciences
http://adeverse.github.io/ade4/
39 stars 10 forks source link

newick2phylog hangs and leaks memory on moderatele large trees #20

Closed IdoBar closed 5 years ago

IdoBar commented 5 years ago

Hi,

I have a moderately large phylogenetic tree that was obtained from QIIME and when I try to import it as a 'phylog' object with newick2phylog() function, it is processing for hours and occupies the memory (last attempt ran for 15 hours and consumed 32GB of RAM), without ever finishing.

library(ade4)
library(ape)
# importing the tree as a 'phylo' object
tree.phylo <- read.tree("rep_set.tre") # this works
# read the tree as a character string
tree.nw <- scan("rep_set.tre", what="character")
# import the tree as a 'phylog' object
bact.phy <- newick2phylog(tree.nw) # does not finish and leaks memory

Is this a known behavior or there's something wrong with my file? The file can be downloaded from here.

Thanks!

sdray commented 5 years ago

Hi,

The function is really old and probably not optimized. It use the 'phylog' format that we aim to remove from ade4. Depending on the analysis you want perform, I would suggest to use adephylo instead of ade4 (we reimplement most phylogenetic analysis of ade4 in adephylo) and use read.tree function (available in ape package) for importing phylogeny. The main advantage of adephylo is that it uses phylo, phylo4 classes to manage phylogenies facilitating exchanges with other R packages.

Cheers