PoonLab / clmp

Genetic clustering with Markov-modulated Poisson processes
https://shiny.filogeneti.ca/clmp/
GNU General Public License v3.0
1 stars 2 forks source link

Stochastic stalling of clmp call on 100-tip tree #32

Closed gtng92 closed 5 years ago

gtng92 commented 5 years ago

clmp function sometimes outputs results with a 100-tip tree, and other times seems to enter an infinite loop.

ArtPoon commented 5 years ago

Thanks for catching this. Can you try running with the trace on and capture some of the output, and post here if it seems informative? Alternatively can you please use set.seed() to create a reproducible example?

ArtPoon commented 5 years ago

Setting seed in R has no effect because optimization is being handled by C program c-cmaes. This program does support seed assignment (cmaes_interface.h):

double * cmaes_init(cmaes_t *, int dimension , double *xstart, 
        double *stddev, long seed, int lambda, 
        const char *input_parameter_filename);

This function is called in mmpp.c:_fit_mmpp:

funvals = cmaes_init(&evo, dimension, theta, init_sd, 0, CMAES_POP_SIZE, cmaes_settings);

where seed is always set to 0. For some reason this appears to have no impact on the reproducibility of the analysis.

ArtPoon commented 5 years ago

This is the tree that is causing problems (generated by @gtng92 using treeswithintrees):

(((((blood_1__blood_34:18.83845008,(blood_1__blood_3:16.77557266,blood_1__blood_28:16.77557266):2.062877419):2.065882613,blood_1__blood_41:20.90433269):16.1267757,(((blood_1__blood_24:24.32571606,blood_1__blood_38:24.32571606):2.713756941,blood_1__blood_5:27.039473):3.65831976,((blood_1__blood_31:8.592510537,(blood_1__blood_7:5.945205775,blood_1__blood_18:5.945205775):2.647304762):5.203924628,(blood_1__blood_42:6.190195062,blood_1__blood_4:6.190195062):7.606240103):16.9013576):6.33331562):1.203029137,((((((blood_1__blood_32:22.87282506,blood_1__blood_33:22.87282506):4.1253134,(blood_1__blood_17:23.19319088,(blood_1__blood_2:17.12989341,(blood_1__blood_37:10.0628461,blood_1__blood_15:10.0628461):7.067047315):6.063297464):3.804947586):3.290564786,(blood_1__blood_16:29.32696793,blood_1__blood_8:29.32696793):0.9617353229):3.473799742,((((blood_1__blood_21:11.52613931,blood_1__blood_27:11.52613931):9.10433086,((blood_1__blood_40:2.724603362,blood_1__blood_20:2.724603362):14.47221682,blood_1__blood_19:17.19682019):3.433649983):4.059972922,blood_1__blood_50:24.69044309):8.84925743,(((blood_1__blood_12:15.09384583,blood_1__blood_48:15.09384583):6.118580148,blood_1__blood_22:21.21242598):4.451852109,(blood_1__blood_11:19.56636401,blood_1__blood_23:19.56636401):6.097914075):7.875422434):0.2228024717):2.682667664,((blood_1__blood_25:9.826279845,blood_1__blood_47:9.826279845):23.28585219,blood_1__blood_1:33.11213203):3.333038624):1.35917385,((blood_1__blood_26:0.03614730854,blood_1__blood_45:0.03614730854):10.16497357,blood_1__blood_36:10.20112088):27.60322363):0.4297930148):1.478232453,(((blood_1__blood_6:21.7936645,((blood_1__blood_29:6.013198504,blood_1__blood_10:6.013198504):12.46792104,blood_1__blood_44:18.48111954):3.312544952):5.366312021,(blood_1__blood_39:18.8238253,blood_1__blood_9:18.8238253):8.336151216):9.691882246,((((tissue_1__cell_44:17.48141339,((tissue_1__cell_2:0.1232940931,tissue_1__cell_49:0.1232940931):3.40208924,tissue_1__cell_45:3.525383333):13.95603006):10.67026333,tissue_1__cell_17:28.15167673):1.180203647,blood_1__blood_46:29.33188037):3.890200275,(((tissue_1__cell_34:18.8333573,tissue_1__cell_50:18.8333573):13.38905108,((blood_1__blood_49:24.59800882,(blood_1__blood_30:20.38331084,(blood_1__blood_14:11.46513996,blood_1__blood_13:11.46513996):8.918170881):4.214697979):7.322712851,((tissue_1__cell_6:20.41634945,tissue_1__cell_19:20.41634945):4.543829058,(((tissue_1__cell_35:2.314355991,tissue_1__cell_28:2.314355991):10.19533144,(tissue_1__cell_22:0.1746452139,tissue_1__cell_3:0.1746452139):12.33504222):2.939169903,tissue_1__cell_30:15.44885733):9.511321174):6.960543164):0.3016867104):0.0001496675735,(((((((tissue_1__cell_12:12.62195229,tissue_1__cell_13:12.62195229):0.9012345544,tissue_1__cell_40:13.52318684):0.8015252879,tissue_1__cell_15:14.32471213):3.448834812,tissue_1__cell_20:17.77354694):6.443168361,tissue_1__cell_43:24.2167153):2.392510454,((tissue_1__cell_5:23.23150088,tissue_1__cell_31:23.23150088):2.6523763,((tissue_1__cell_9:8.997321643,tissue_1__cell_21:8.997321643):9.973987498,(tissue_1__cell_8:5.290729122,(tissue_1__cell_37:1.889889724,tissue_1__cell_4:1.889889724):3.400839398):13.68058002):6.912568037):0.7253485769):5.44861109,((((tissue_1__cell_47:14.85513165,tissue_1__cell_18:14.85513165):11.49728233,tissue_1__cell_24:26.35241397):2.873853592,(((tissue_1__cell_27:4.577187688,tissue_1__cell_42:4.577187688):11.82931935,tissue_1__cell_7:16.40650703):10.82190972,(((tissue_1__cell_16:6.501895566,tissue_1__cell_48:6.501895566):17.8655709,(tissue_1__cell_36:19.90875451,(((tissue_1__cell_11:9.601265417,tissue_1__cell_14:9.601265417):2.466193029,tissue_1__cell_39:12.06745845):0.9386599322,tissue_1__cell_38:13.00611838):6.902636129):4.458711962):1.043939924,tissue_1__cell_41:25.41140639):1.817010363):1.99785081):0.04557582674,(((tissue_1__cell_32:8.058138255,tissue_1__cell_23:8.058138255):14.66824507,((tissue_1__cell_33:20.71301183,(blood_1__blood_43:13.25344371,tissue_1__cell_1:13.25344371):7.45956812):0.7387653076,(tissue_1__cell_46:7.929326984,tissue_1__cell_10:7.929326984):13.52245015):1.274606187):5.918160905,(blood_1__blood_35:23.06321673,(tissue_1__cell_26:15.64093462,(tissue_1__cell_25:4.201936264,tissue_1__cell_29:4.201936264):11.43899836):7.422282105):5.581327501):0.627299163):2.785993453):0.1647212049):0.9995225971):3.629778114):2.860511212);
ArtPoon commented 5 years ago

Here is an excerpt from a trace for a successful run:

0.165075    0.225386    0.000000    0.121704    -119.228129
0.165075    0.263930    0.000000    0.067786    -119.228129
0.165075    0.272924    0.000000    0.079696    -119.228129
0.165075    0.243268    0.000000    0.099039    -119.228129
0.165075    0.263892    0.000000    0.075207    -119.228129
ConditionNumber: maximal condition number 9.01e+12 reached. maxEW=5.72e-01,minEW=3.65e-14,maxdiagC=5.04e-01,mindiagC=2.47e-14
log likelihood for 2 state model is -119.228129
rates: 0.165075 0.228413 
Q: [    *   0.000000 ]
   [ 0.119349   *    ]
ArtPoon commented 5 years ago

Here is a screenshot of a failed run: Screenshot from 2019-05-30 13-23-05

One of the branching rates is shooting up to large values, and I think the transition rate from rate class 2 to 1 is also going high -- effectively it wants to become a 1-rate class model.

ArtPoon commented 5 years ago

See if setting narrower bounds helps:

> res <- clmp(ft, nrates=2, bounds=c(0.01, 100, 0.01, 100))
ConditionNumber: maximal condition number 9.01e+12 reached. maxEW=3.54e-20,minEW=3.50e-33,maxdiagC=3.25e-20,mindiagC=7.57e-23
log likelihood for 2 state model is -118.729812
rates: 0.010000 1.202317 
Q: [    *   14.956215 ]
   [ 99.999943   *    ]
ArtPoon commented 5 years ago

Still takes a long time on second run. Options:

ArtPoon commented 5 years ago

@gtng92 , accessing these settings is a bit obscure. Here is the current process:

  1. always pass NULL pointer to cmaes_settings string (const char *) argument of fit_mmpp https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/clmpR.c#L143-L144

  2. fit_mmpp calls internal function _fit_mmpp with same cmaes_settings string: https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/mmpp.c#L66-L67

  3. _fit_mmpp passes cmaes_settings to mmpp.c:cmaes_init() https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/mmpp.c#L395 &evo is a reference to a C struct of type cmaes_t, which looks like a general purpose structure for this method - based on the comments, its purpose appears to be to make the source code more "object oriented".

  4. cmaes_init passes NULL in cmaes_settings as input_parameter_filename to cmaes_init_para() https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/cmaes.c#L403-L415

  5. cmaes_init_para calls cmaes_readpara_init. https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/cmaes.c#L264-L276

  6. First argument for cmaes_readpara_init is struct of type cmaes_readpara_t that stores parameter settings read from a file. Last argument filename is synonymous to input_parameter_filename and receives the NULL. https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/cmaes.c#L2491-L2498

ArtPoon commented 5 years ago
  1. The next lines in cmaes_readpara_init are allocating space in memory for the components of the cmaes_readpara_t struct. For example: https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/cmaes.c#L2511 assigns a formatted string to the rgsformat array of string pointers, and a pointer to the t->N object in the rgpadr array of pointers

  2. After initializing some entries in t to arbitrary default values, the NULL filename triggers a call to cmaes_readpara_ReadFromFile https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/cmaes.c#L2580-L2582

  3. A NULL value for argument filename causes the program to read settings from the hard-coded file cmaes_initials.par: https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/cmaes.c#L2657-L2663

Where is cmaes_initials.par and are we getting this far?

ArtPoon commented 5 years ago

Used a printf statement to confirm that we are getting to the code location referenced above. This code: https://github.com/PoonLab/clmp/blob/1598e9a27270c6bc52d31c2bfc09a4ccb0bcfe29/src/cmaes.c#L2666-L2669 explains why that error message is appearing in the log file errcmaes.err. So we get bounced from cmaes_readpara_ReadFromFile back to cmaes_readpara_init. What happens next?

ArtPoon commented 5 years ago

PS @gtng92 if we use an R function to generate a cmaes_initials.par file in the right location of the filesystem, then we can feed custom optimization settings to CMA-ES.

ArtPoon commented 5 years ago

Looks like the rest of the settings are populated in the remainder of cmaes_readpara_init. Note that:

ArtPoon commented 5 years ago

I copied the cmaes_initials.par file from the original repo to the directory where I would call R. Next, I ran the following:

> require(clmp)
Loading required package: clmp
Loading required package: ape
Loading required package: ggtree
Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'treeio':
  method     from
  root.phylo ape 
ggtree v1.16.1  For help: https://yulab-smu.github.io/treedata-book/

If you use ggtree in published research, please cite the most appropriate paper(s):

- Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution 2018, accepted. doi: 10.1093/molbev/msy194- Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2017, 8(1):28-36, doi:10.1111/2041-210X.12628

Attaching package: ‘ggtree’

The following object is masked from ‘package:ape’:

    rotate

> t1 <- read.tree('examples/test.nwk'
+ )
> res <- clmp(t1, nrates=1)
ConditionNumber: maximal condition number 9.01e+12 reached. maxEW=6.42e+00,minEW=7.12e-13,maxdiagC=1.96e+00,mindiagC=6.55e-13
Warning in fit_mmpp: parameter estimates did not converge
rates: 531.823490 
Q: [    *    ]

Next I looked in actparcmaes.par to see if the settings from this file were read:

# Read from cmaes_initials.par at Tue Jun  4 16:24:40 2019

 N 22
 typicalX 22
      0       0       0       0       0
      0       0       0       0       0
      0       0       0       0       0
      0       0       0       0       0
      0       0 
 initialX 22
    0.5     0.5     0.5     0.5     0.5
    0.5     0.5     0.5     0.5     0.5
    0.5     0.5     0.5     0.5     0.5
    0.5     0.5     0.5     0.5     0.5
    0.5     0.5 
 initialStandardDeviations 22
    0.3     0.3     0.3     0.3     0.3
    0.3     0.3     0.3     0.3     0.3
    0.3     0.3     0.3     0.3     0.3
    0.3     0.3     0.3     0.3     0.3
    0.3     0.3 
 seed 1349904434
 stopMaxFunEvals 1e+299
 stopMaxIter 1e+299
 stopFitness
 stopTolFun 1e-12
 stopTolFunHist 1e-13
 stopTolX 1e-11
 stopTolUpXFactor 1000
 lambda 100
 mu 50
 weights   log
 cs 0.559574
 damps 1.69508
 ccumcov 0.153846
 mucov 27.2221
 ccov 0.0854774
 diagonalCovarianceMatrix 0
 updatecov 0.0531773
 maxTimeFractionForEigendecompostion 1
 resume  

Here is the previous entry in actparcmaes.par:

# Read from  at Tue Jun  4 16:07:08 2019

 N 1
 initialX 1
   3.93 
 initialStandardDeviations 1
      1 
 seed 1349801890
 stopMaxFunEvals 14400
 stopMaxIter 144
 stopFitness
 stopTolFun 1e-12
 stopTolFunHist 1e-13
 stopTolX 0
 stopTolUpXFactor 1000
 lambda 100
 mu 50
 weights   log
 cs 0.935943
 damps 7.13444
 ccumcov 0.8
 mucov 27.2221
 ccov 0.975871
 diagonalCovarianceMatrix 0
 updatecov 0.102473
 maxTimeFractionForEigendecompostion 0.2
 resume      
ArtPoon commented 5 years ago

@gtng92 before we get stuck into writing an R routine to generate a settings file, we need to manually tweak this file to see if we can improve the optimization on your 100-tip tree. I would start by writing the default (no file) settings to cmaes_initials.par and tinker from there.

ArtPoon commented 5 years ago

This revised par file is complications:

ArtPoon commented 5 years ago

Whenever the search terminates normally, it is because this criterion is obtained:

ConditionNumber: maximal condition number 9.01e+12 reached. maxEW=2.31e-01,minEW=1.23e-14,maxdiagC=2.91e-01,mindiagC=5.45e-15

(note this is a bad outcome, but the optimization problem is also bad because this tree does not support two rate classes). I speculate that if we make this criterion more stringent then we will reduce the amount of stuck runs.

ArtPoon commented 5 years ago

Based on some further experiments, this seems to be more of a processing issue than one of convergence. Specifically, there are much longer delays in the CMA-ES optimization routine as the transition rate estimates increase.

ArtPoon commented 5 years ago

Ok I've changed the code so we can set the tolerance values from R, as well as the random seed.

ArtPoon commented 5 years ago

We can now set the random seed in the cmaes.c code from R. This is a reproducibly long run (given above Newick tree string is saved to working/issue32.nwk):

> require(clmp)
> t1 <- read.tree('working/issue32.nwk')
> res <- clmp(t1, trace=10, tol=1e-3, tolhist=1e-4, seed=3)
ArtPoon commented 5 years ago

Setting default tol=1e-3 and tolhist=1e-3. This reduces the precision in MLE of example tree structSIR in exchange for more reasonable stopping behaviour for this edge case. (It also gets much faster!)