KlausVigo / phangorn

Phylogenetic analysis in R
http://klausvigo.github.io/phangorn/
202 stars 38 forks source link

One tree has a different number of tips - bootstrap.pml() function runs #87

Open alptaciroglu opened 4 years ago

alptaciroglu commented 4 years ago

I was wondering if anyone encountered this error while running the "Appendix 2: Standard scripts for amino acid analysis". At the last line of code, I sometimes get this error working with exact same sequences and sometimes not.

What I do is run the script a couple of times and find the one that turns out fine which is a waste of resources, not ideal for automatization of a larger script and also hard to reproduce, unfortunately. Thanks in advance

Here is the truncated code for ease of access.

library(phangorn)
file="myfile"
dat = read.phyDat(file, type = "AA")
dm = dist.ml(dat, model="JTT")
tree = NJ(dm)
# parallel will only work safely from command line
# and not at all windows
(mt <- modelTest(dat, model=c("JTT", "LG", "WAG"),
multicore=TRUE))
# run all available amino acid models
(mt <- modelTest(dat, model="all", multicore=TRUE))
fitStart = eval(get(mt$Model[which.min(mt$BIC)], env), env)
fitNJ = pml(tree, dat, model="JTT", k=4, inv=.2)
fit = optim.pml(fitNJ, rearrangement = "stochastic",
optInv=TRUE, optGamma=TRUE)
fit
bs = bootstrap.pml(fit, bs=100, optNni=TRUE, multicore=TRUE)  ### Where the error occurs

optimize topology: -56831.99 --> -56831.99 0 optimize edge weights: -56831.99 --> -56831.99 Error in FUN(X[[i]], ...) : one tree has a different number of tips Calls: ebiTreeCreate ... bootstrap.pml -> .compressTipLabel -> lapply -> FUN Execution halted ~

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.utf-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.utf-8        LC_COLLATE=en_US.utf-8    
 [5] LC_MONETARY=en_US.utf-8    LC_MESSAGES=en_US.utf-8   
 [7] LC_PAPER=en_US.utf-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.utf-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] tools     stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] data.table_1.12.2   ggtree_1.10.5       treeio_1.2.2       
 [4] ggplot2_3.2.0       DECIPHER_2.6.0      RSQLite_2.1.1      
 [7] rphast_1.6.9        phangorn_2.5.5      ape_5.3            
[10] msa_1.10.0          seqinr_3.4-5        stringr_1.4.0      
[13] Biostrings_2.46.0   XVector_0.18.0      IRanges_2.12.0     
[16] S4Vectors_0.16.0    BiocGenerics_0.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       pillar_1.4.1     compiler_3.4.4   zlibbioc_1.24.0 
 [5] digest_0.6.19    bit_1.1-14       jsonlite_1.6     tibble_2.1.3    
 [9] memoise_1.1.0    nlme_3.1-131     gtable_0.3.0     lattice_0.20-35 
[13] pkgconfig_2.0.2  rlang_0.3.4      Matrix_1.2-12    fastmatch_1.1-0 
[17] igraph_1.2.4.1   DBI_1.0.0        rvcheck_0.1.3    withr_2.1.2     
[21] dplyr_0.8.1      tidyselect_0.2.5 ade4_1.7-13      bit64_0.9-7     
[25] grid_3.4.4       glue_1.3.1       R6_2.4.0         tidyr_0.8.3     
[29] purrr_0.3.2      blob_1.1.1       magrittr_1.5     scales_1.0.0    
[33] MASS_7.3-49      assertthat_0.2.1 colorspace_1.4-1 quadprog_1.5-7  
[37] stringi_1.4.3    lazyeval_0.2.2   munsell_0.5.0    crayon_1.3.4    
KlausVigo commented 4 years ago

Hi @alptaciroglu , is it possible to send me the data set (klaus.schliep@gmail.com), so I can try to reproduce the error. Klaus

alptaciroglu commented 4 years ago

Hello @KlausVigo,

Thanks for your reply. I sent the data set. It is in phyDat format and saved as RData. Cheers!