KlausVigo / phangorn

Phylogenetic analysis in R
http://klausvigo.github.io/phangorn/
203 stars 38 forks source link

pml segfault: memory not mapped #149

Open lemmonquiche opened 1 year ago

lemmonquiche commented 1 year ago

Hello,

I am having the same issue as address in #144 while running on an 1TB RAM server. I have a tree with 50,211 leaf nodes with MSA length of 3,828 bases. I have multiple other large trees and alignment that are also failing due to the memory not mapped and foresee more in the future, so I would be very interested in seeing phangorn being able to handle trees of this size and even larger.

I specifically am using your package instead of RAxML or iqtree because I need to keep my tree ultrametric. While BEAST could be an option, it does not seem particularly friendly to automated pipelines/limited documentation online.

Below is how I am using pml() for my task. I can send you the sample data if you would like.

library(phangorn)
library(phytools)

#Load MSA and tree into R:
nt_seqs <- read.phyDat("filteredAlignment.fasta"), format = "fasta", type = "DNA")
tree <- read.newick("filteredTree.nwk")

##pare alignment to only use the sequences included in the tree
nt_seqs_pared <- nt_seqs[which(names(nt_seqs)%in% tree$tip.label)] 

##coerce tree to be ultrametric
tree_ultra <- phangorn:::minEdge(tree, tau = 1e-5, enforce_ultrametric = TRUE)
fit_ultra <- pml(tree_ultra, data = nt_seqs_pared, k = 4, bf = baseFreq(nt_seqs_pared))
fitGTR_ultra <- optim.pml(fit_ultra, model = "GTR", optRooted = T, optQ = T, optGamma = TRUE, optBf = TRUE,  rearrangement = "none", control = pml.control(trace = 1))

At the pml() I get:

 *** caught segfault ***
address 0x14dcfabe0, cause 'memory not mapped'

Traceback:
 1: pml.fit(tree, data, bf, shape = shape, k = k, Q = Q, levels = attr(data,     "levels"), inv = inv, rate = rate, g = g, w = w, eig = eig,     INV = INV, ll.0 = ll.0, llMix = llMix, wMix = wMix, site = TRUE,     ASC = ASC)
 2: pml(tree_ultra, data = nt_seqs_pared, k = 4, bf = baseFreq(nt_seqs_pared))
An irrecoverable exception occurred. R is aborting now ...

Information about version:

R version 4.3.0 (2023-04-21)                                                                                                                                                                  
Platform: x86_64-pc-linux-gnu (64-bit)                                                                                                                                                        
Running under: Ubuntu 20.04.6 LTS                                                                                                                                                             

Matrix products: default                                                                                                                                                                      
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3                                                                                                                                     
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3;  LAPACK version 3.9.0                                                                                                            

locale:                                                                                                                                                                                       
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C                                                                                                                                                  
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8                                                                                                                                        
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8                                                                                                                                       
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] seqmagick_0.1.5 phytools_1.5-1  maps_3.4.1      phangorn_2.11.1
[5] ape_5.7-1 

Thanks

KlausVigo commented 1 year ago

Hi @lemmonquiche, the problem is as in #144 mentioned that there is too much space allocated. Unfortunately there is no quick fix. I am working on fixing this, but it will take some time. All the underlying C-code needs to be rewritten. I started using RcppArmadillo, which should make improvements easier later on and I plan to better integrate partitioned and mixture models. The ultrametric and tipdated phylogeny optimisation is a bit simpler than unrooted trees, so I might get a testing version out earlier. Kind regards, Klaus

Phylloxera commented 1 year ago

I, too, am having an out of memory issue. Mine is on NJ, treeNJ <- NJ(dm)

#Error in nj(x) : cannot allocate memory block of size 134217728 Tb
#Calls: source -> withVisible -> eval -> eval -> NJ -> reorder -> nj

I'm willing to test and have access to some high performance computing, so I'll keep an eye on this.

KlausVigo commented 1 year ago

Hi @Phylloxera,
this problem should be fixed by commit emmanuelparadis/ape@20332d8 and discussion emmanuelparadis/ape#97 . So NJ should work after updating ape to the development version. NJ is just a wrapper around the ape function nj. pml, pml_bb will likely complain afterwards. Regards, Klaus

I, too, am having an out of memory issue. Mine is on NJ, treeNJ <- NJ(dm)

#Error in nj(x) : cannot allocate memory block of size 134217728 Tb
#Calls: source -> withVisible -> eval -> eval -> NJ -> reorder -> nj

I'm willing to test and have access to some high performance computing, so I'll keep an eye on this.