KlausVigo / phangorn

Phylogenetic analysis in R
http://klausvigo.github.io/phangorn/
202 stars 38 forks source link

optim.pml hangs infinitely #147

Closed hberger closed 1 year ago

hberger commented 1 year ago

I came across this case where optim.pml will hang infinitely with a small data set (attached as RDS).

library(phangorn)
d = readRDS("fitGTR_test_data.rds")
set.seed(1234)
#debug(phangorn:::pml.nni) # 5th iteration hangs in optim.quartet at i=49
fitGTR <- optim.pml(d, model="GTR", optInv=TRUE, optGamma=TRUE,
                    rearrangement = "stochastic", control = pml.control(trace = 20, epsilon=1e-3, maxit = 1))

Further debugging of the same call with gdb shows that the code hangs in sampleMatrix from ml.c (most likely called from phangorn:::pml4.init -> PML4 -> lll3 -> sampleMatrix), where tmp is not checked for negative values and loops forever.

Thread 1 "R" received signal SIGINT, Interrupt.
0x00007fffeddbb63b in scaleMatrix (X=X@entry=0x555562f35e70, nr=nr@entry=0x7ffffffe54c0, nc=nc@entry=0x7ffffffe54c4, result=0x555563124c80) at ml.c:91
91  ml.c: No such file or directory.
(gdb) info locals
i = 7
j = 0
tmp = -inf

(gdb) bt
#0  0x00007fffeddbb63b in scaleMatrix (X=X@entry=0x555562f35e70, nr=nr@entry=0x7ffffffe54c0, nc=nc@entry=0x7ffffffe54c4, result=0x555563124c80) at ml.c:91
#1  0x00007fffeddbc262 in lll3 (dlist=dlist@entry=0x555560bfc1c0, eva=eva@entry=0x555563356888, eve=eve@entry=0x55555e8adeb8, evei=evei@entry=0x55555e8ae228, el=0x55555f526528, g=g@entry=0.00067115147490255515, nr=nr@entry=0x7ffffffe54c0, nc=0x7ffffffe54c4, node=0x55555e8c8e08, 
    edge=0x55555e8c8ea8, nTips=148, contrast=0x5555584da7c0, nco=<optimized out>, n=5, scaleTmp=0x555562d53480, bf=0x55555e8c8ef8, TMP=0x5555650094b0, ans=0x555562ec6e70, SC=0x555563116e80) at ml.c:238
#2  0x00007fffeddbcd5c in PML4 (dlist=0x555560bfc1c0, EL=0x55555f5264f8, W=<optimized out>, G=<optimized out>, NR=<optimized out>, NC=<optimized out>, K=0x55555f87b7a0, eig=0x5555633568f8, bf=0x55555e8c8ec8, node=0x55555e8c8dd8, edge=0x55555e8c8e78, NTips=0x5555603d0cc8, 

Unfortunately the documentation in the code is quite sparse, so I would appreciate any suggestions on how to avoid or fix this error.

Thanks a lot in advance!

PS: Adding some function + parameter descriptions particularly in C code would really help.

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /home/hilmar/shared/R/4.2_no_X/lib/R/lib/libRblas.so
LAPACK: /home/hilmar/shared/R/4.2_no_X/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] phangorn_2.11.1 ape_5.7-1

fitGTR_test_data.rds.gz

KlausVigo commented 1 year ago

Hi Hilmar, to avoid the error try

fitGTR <- optim.pml(d, model="GTR", optInv=TRUE, optGamma=TRUE,
          rearrangement = "stochastic", control = pml.control(tau=1e-5, maxit = 10))

This increases the minimal edge length and seems to avoid the problem. Seems a lot of internal and external edges are close to zero. I will try to fix the error on the C side. Thanks for narrowing the error down.
Kind regards, Klaus

hberger commented 1 year ago

Hi Klaus, thanks a lot for the quick help. I also found that optGamma = FALSE works, but I will go with your suggestion. Best regards Hilmar

KlausVigo commented 1 year ago

Hi Hilmar, I just pushed a bug fix, seems to check if tmp is greater 0 is enough. Thanks again
Best regards, Klaus

hberger commented 1 year ago

Thanks a lot! H.