KlausVigo / phangorn

Phylogenetic analysis in R
http://klausvigo.github.io/phangorn/
203 stars 38 forks source link

`optim.pml` bug (expl fails) #32

Closed thibautjombart closed 6 years ago

thibautjombart commented 7 years ago

I think this is a bug in the current stable version of phangorn, with R 3.4.1. Reproducible code below:

library(ape)
library(phangorn)
library(adegenet)
dna <- ug("https://raw.githubusercontent.com/reconhub/phylo-practical/master/data/usflu.fasta")
dna2 <- as.phyDat(dna)
tre.ini <- nj(dist.dna(dna,model = "TN93"))
fit.ini <- pml(tre.ini, dna2, k = 4)
fit <- optim.pml(fit.ini, optNni = TRUE, optBf = TRUE,
                 optQ = TRUE, optGamma = TRUE)

The error I get from this last line reads:

Error in quadprog::solve.QP.compact(as.matrix(Dmat), as.vector(dvec),  : 
  object '.QP_aind' not found

I tried disabling various parts of the optimisation but no luck. This is coming from the intro to phylogenetics practical I have been running for a few years, and did not generate errors before.

thibautjombart commented 7 years ago

Hi again, sorry for being the bringer of bad news but it looks indeed like a bug - the example of optim.pml is failing on my platform:

> example(optim.pml)

optm.p>   example(NJ)

NJ> data(Laurasiatherian)

NJ> dm <- dist.ml(Laurasiatherian)

NJ> tree <- NJ(dm)

NJ> plot(tree)
Hit <Return> to see next plot: 

optm.p> # Jukes-Cantor (starting tree from NJ)  
optm.p>   fitJC <- pml(tree, Laurasiatherian)  

optm.p> # optimize edge length parameter     
optm.p>   fitJC <- optim.pml(fitJC)
Error in quadprog::solve.QP.compact(as.matrix(Dmat), as.vector(dvec),  : 
  object '.QP_aind' not found
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.10

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] phangorn_2.2.0  ape_4.1         testthat_1.0.2  rmarkdown_1.6  
[5] knitr_1.17      devtools_1.13.3

loaded via a namespace (and not attached):
 [1] igraph_1.1.2    Rcpp_0.12.13    magrittr_1.5    lattice_0.20-35
 [5] R6_2.2.2        quadprog_1.5-5  fastmatch_1.1-0 stringr_1.2.0  
 [9] tools_3.4.1     parallel_3.4.1  grid_3.4.1      nlme_3.1-131   
[13] withr_2.0.0     htmltools_0.3.6 rprojroot_1.2   digest_0.6.12  
[17] crayon_1.3.2    Matrix_1.2-11   memoise_1.1.0   evaluate_0.10.1
[21] stringi_1.1.5   compiler_3.4.1  backports_1.0.5 pkgconfig_2.0.1
> 
thibautjombart commented 7 years ago

Poke @KlausVigo ;) I have a 2 weeks course coming on genetic data analysis. Shall I use a previous version? Update on the bug: it still happens on R 3.4.2:

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] phangorn_2.2.0  ape_4.1         testthat_1.0.2  rmarkdown_1.6  
[5] knitr_1.17      devtools_1.13.3

loaded via a namespace (and not attached):
 [1] igraph_1.1.2    Rcpp_0.12.13    magrittr_1.5    lattice_0.20-35
 [5] R6_2.2.2        quadprog_1.5-5  fastmatch_1.1-0 stringr_1.2.0  
 [9] tools_3.4.2     parallel_3.4.2  grid_3.4.2      nlme_3.1-131   
[13] withr_2.0.0     htmltools_0.3.6 rprojroot_1.2   digest_0.6.12  
[17] crayon_1.3.4    Matrix_1.2-11   memoise_1.1.0   evaluate_0.10.1
[21] stringi_1.1.5   compiler_3.4.2  backports_1.1.1 pkgconfig_2.0.1
KlausVigo commented 7 years ago

@thibautjombart sorry for the late reply. The dev version of phangorn depends on the dev version of ape (http://ape-package.ird.fr/ape_installation.html). Emmanuel wants to update the next days and I will update phangorn as soon as ape is on CRAN.

thibautjombart commented 7 years ago

Awesome, thanks! =D Does it mean it is fixed in the devel?

zkamvar commented 7 years ago

For some reason this works on my machine, though. This is weird.

library("phangorn")
#> Loading required package: ape
example(optim.pml)
#> 
#> optm.p>   example(NJ)
#> 
#> NJ> data(Laurasiatherian)
#> 
#> NJ> dm <- dist.ml(Laurasiatherian)
#> 
#> NJ> tree <- NJ(dm)
#> 
#> NJ> plot(tree)

#> 
#> optm.p> # Jukes-Cantor (starting tree from NJ)  
#> optm.p>   fitJC <- pml(tree, Laurasiatherian)  
#> 
#> optm.p> # optimize edge length parameter     
#> optm.p>   fitJC <- optim.pml(fitJC)
#> optimize edge weights:  -54808.83 --> -54230.41 
#> optimize edge weights:  -54230.41 --> -54230.41 
#> optimize edge weights:  -54230.41 --> -54230.41 
#> optimize edge weights:  -54230.41 --> -54230.41 
#> 
#> optm.p>   fitJC 
#> 
#>  loglikelihood: -54230.41 
#> 
#> unconstrained loglikelihood: -17300.92 
#> 
#> Rate matrix:
#>   a c g t
#> a 0 1 1 1
#> c 1 0 1 1
#> g 1 1 0 1
#> t 1 1 1 0
#> 
#> Base frequencies:  
#> 0.25 0.25 0.25 0.25 
#> 
#> optm.p> ## Not run: 
#> optm.p> ##D     
#> optm.p> ##D # search for a better tree using NNI rearrangements     
#> optm.p> ##D   fitJC <- optim.pml(fitJC, optNni=TRUE)
#> optm.p> ##D   fitJC   
#> optm.p> ##D   plot(fitJC$tree)
#> optm.p> ##D 
#> optm.p> ##D # JC + Gamma + I - model
#> optm.p> ##D   fitJC_GI <- update(fitJC, k=4, inv=.2)
#> optm.p> ##D # optimize shape parameter + proportion of invariant sites     
#> optm.p> ##D   fitJC_GI <- optim.pml(fitJC_GI, optGamma=TRUE, optInv=TRUE)
#> optm.p> ##D # GTR + Gamma + I - model
#> optm.p> ##D   fitGTR <- optim.pml(fitJC_GI, rearrangement = "stochastic", 
#> optm.p> ##D       optGamma=TRUE, optInv=TRUE, model="GTR") 
#> optm.p> ## End(Not run)
#> optm.p> 
#> optm.p> 
#> optm.p> # 2-state data (RY-coded)  
#> optm.p>   dat <- acgt2ry(Laurasiatherian) 
#> 
#> optm.p>   fit2ST <- pml(tree, dat) 
#> 
#> optm.p>   fit2ST <- optim.pml(fit2ST,optNni=TRUE) 
#> optimize edge weights:  -19996.09 --> -17092.17 
#> optimize edge weights:  -17092.17 --> -17092.17 
#> optimize topology:  -17092.17 --> -17058.43 
#> optimize topology:  -17058.43 --> -17030.4 
#> optimize topology:  -17030.4 --> -17025.8 
#> 9 
#> optimize edge weights:  -17025.8 --> -17025.8 
#> optimize topology:  -17025.8 --> -17024.81 
#> optimize topology:  -17024.81 --> -17024.81 
#> 1 
#> optimize edge weights:  -17024.81 --> -17024.81 
#> optimize topology:  -17024.81 --> -17024.81 
#> 0 
#> optimize edge weights:  -17024.81 --> -17024.81 
#> 
#> optm.p>   fit2ST
#> 
#>  loglikelihood: -17024.81 
#> 
#> unconstrained loglikelihood: -8702.769 
#> 
#> Rate matrix:
#>   r y
#> r 0 1
#> y 1 0
#> 
#> Base frequencies:  
#> 0.5 0.5 
#> 
#> optm.p> # show some of the methods available for class pml
#> optm.p>   methods(class="pml")  
#> [1] AICc   anova  BIC    logLik plot   print  simSeq update vcov  
#> see '?methods' for accessing help and source code
sessionInfo()
#> R version 3.4.2 (2017-09-28)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Sierra 10.12.6
#> 
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] phangorn_2.2.0 ape_4.1       
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.13    quadprog_1.5-5  lattice_0.20-35 digest_0.6.12  
#>  [5] rprojroot_1.2   grid_3.4.2      nlme_3.1-131    backports_1.1.1
#>  [9] formatR_1.5     magrittr_1.5    evaluate_0.10.1 stringi_1.1.5  
#> [13] Matrix_1.2-11   fastmatch_1.1-0 rmarkdown_1.6   tools_3.4.2    
#> [17] stringr_1.2.0   igraph_1.1.2    parallel_3.4.2  yaml_2.1.14    
#> [21] compiler_3.4.2  pkgconfig_2.0.1 htmltools_0.3.6 knitr_1.17
thibautjombart commented 7 years ago

Weird. But interesting. My current setup is:

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.10

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] phangorn_2.2.0 ape_4.1       

loaded via a namespace (and not attached):
 [1] compiler_3.4.2  Matrix_1.2-11   magrittr_1.5    parallel_3.4.2 
 [5] tools_3.4.2     igraph_1.1.2    fastmatch_1.1-0 **Rcpp_0.12.13.1 **
 [9] nlme_3.1-131    grid_3.4.2      pkgconfig_2.0.1 lattice_0.20-35
[13] quadprog_1.5-5  tcltk_3.4.2    

@zkamvar the only difference I notice is the Rcpp version; can you try after updating your packages? BTW this looks like quadprog not registering a Fortran routine properly.. :-/

zkamvar commented 7 years ago

The current version of Rcpp on CRAN is 12.13, which was released less than two weeks ago. After updating with the GitHub version of Rcpp, there is no change.

> devtools::session_info()
Session info ------------------------------------------------------------------
 setting  value
 version  R version 3.4.2 (2017-09-28)
 system   x86_64, darwin15.6.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 tz       America/Chicago
 date     2017-10-10

Packages ----------------------------------------------------------------------
 package   * version   date       source
 ape       * 4.1       2017-02-14 CRAN (R 3.4.0)
 base      * 3.4.2     2017-10-04 local
 compiler    3.4.2     2017-10-04 local
 curl        3.0       2017-10-06 CRAN (R 3.4.2)
 datasets  * 3.4.2     2017-10-04 local
 devtools    1.13.3    2017-08-02 CRAN (R 3.4.1)
 digest      0.6.12    2017-01-27 CRAN (R 3.4.0)
 fastmatch   1.1-0     2017-01-28 CRAN (R 3.4.0)
 git2r       0.19.0    2017-07-19 CRAN (R 3.4.1)
 graphics  * 3.4.2     2017-10-04 local
 grDevices * 3.4.2     2017-10-04 local
 grid        3.4.2     2017-10-04 local
 httr        1.3.1     2017-08-20 cran (@1.3.1)
 igraph      1.1.2     2017-07-21 cran (@1.1.2)
 knitr       1.17      2017-08-10 cran (@1.17)
 lattice     0.20-35   2017-03-25 CRAN (R 3.4.0)
 magrittr    1.5       2014-11-22 CRAN (R 3.4.0)
 Matrix      1.2-11    2017-08-16 CRAN (R 3.4.1)
 memoise     1.1.0     2017-04-21 CRAN (R 3.4.0)
 methods   * 3.4.2     2017-10-04 local
 nlme        3.1-131   2017-02-06 CRAN (R 3.4.0)
 parallel    3.4.2     2017-10-04 local
 phangorn  * 2.2.0     2017-04-03 CRAN (R 3.4.0)
 pkgconfig   2.0.1     2017-03-21 CRAN (R 3.4.0)
 quadprog    1.5-5     2013-04-17 CRAN (R 3.4.0)
 R6          2.2.2     2017-06-17 cran (@2.2.2)
 Rcpp        0.12.13.1 2017-10-10 Github (RcppCore/Rcpp@136d50f)
 stats     * 3.4.2     2017-10-04 local
 tools       3.4.2     2017-10-04 local
 utils     * 3.4.2     2017-10-04 local
 withr       2.0.0     2017-07-28 CRAN (R 3.4.1)

WHAT DOES THIS EVEN?! (屮゚Д゚)屮

zkamvar commented 7 years ago

To add insult to injury, the .QP_aind subroutine is so effing simple:

https://github.com/cran/quadprog/blob/f0c9a18769fb4e1b684c5b7e1d3af80459b56767/src/aind.f#L26-L38

But, it does look like they've been registering their routines correctly :/ https://github.com/cran/quadprog/blob/master/src/init.c

thibautjombart commented 7 years ago

News! The problem is triggered by the Rcpp version apparently. All works when doing the following:

zkamvar commented 7 years ago

@thibautjombart I can't confirm this error rebuilding phangorn with Rcpp_0.12.13.1. Everything appears to work.

thibautjombart commented 7 years ago

Wow. Another Lovecraftian error with unusual settings then. Somehow I seem to be good at finding these.. ;)

KlausVigo commented 7 years ago

Hi guys, let see what happened when ape 5.0 (it seems to sit in the waiting queue on CRAN) and the next version (github) of phangorn is up on CRAN.

zkamvar commented 7 years ago

Every ape update brings an unspeakable wave of cold dread over me.

Sent from my iPhone

On Oct 11, 2017, at 08:27, Klaus Schliep notifications@github.com wrote:

Hi guys, let see what happened when ape 5.0 (it seems to sit in the waiting queue on CRAN) and the next version (github) of phangorn is up on CRAN.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

thibautjombart commented 7 years ago

Just you wait for the next update of 'heu déjeuner', as some pronunce it..

KlausVigo commented 6 years ago

Hi @thibautjombart, can you please check if this is still a problem. If yes I need to figure out a way how to reproduce the error.

thibautjombart commented 6 years ago

After reverting to an non-devel Rcpp things were fine. I'm gonna put that in as a Lovecraftian case and close. I'll poke again if it resurfaces.

zkamvar commented 6 years ago

I'll poke again if it resurfaces.

You mean "if it awakens from its slumber"