abacus-gene / paml

PAML is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. Please report only **technical issues** on this repository (e.g., compiling, programs abort or do not run at all, etc.). Problems with input data and general questions should be posted at https://groups.google.com/g/pamlsoftware?pli
GNU General Public License v3.0
103 stars 19 forks source link

Enabling multiple resuming in MCMCTree #41

Open gaballench opened 6 months ago

gaballench commented 6 months ago

Hello Ziheng,

I noted that MCMCTree allows checkpointing but resuming only works once. This happens because the first time (with checkpoint=1) it is saving the MCMC state to the binary mcmctree.ckpt (in SaveMCMCstate), while the second time (with checkpoint=2) it is just reading from mcmctree.ckpt (in ReadMCMCstate) but during sampling the state is not being saved.

I modified MCMC so that once the option 2 in com.checkpoint triggers the appropriate control flow section, it is set again to 1 so that internally MCMC believes we are in the first time, and therefore saves to mcmctree.ckpt.

This mechanism allows e.g. to run a first time with checkpoint=1, and then copy that directory to a new one where checkpoint is set to 2 and run again, thus starting from the state in the checkpoint. Then we can just copy again and run without need to modify anything as checkpoint is already 2.

I am providing a minimal example based on DatingSoftBounds, using approximate likelihood for fast evaluation, and with a script that will run sequentially four analyses which can be concatenated into a grand sample. This is what we need in order to illustrate it:

  1. Compile the commit associated to this pull request
  2. Go to the test_checkpointing directory
  3. Run test_checkpoint.sh
  4. Run cat_samples.sh
  5. Run trace.R interactively

Below I'm copying the relevant part of the output of the script test_checkpoint.sh that I run on my computer (asterisks around lnL0, lnPr, and lnL added manually as a visual aid):

ENTERING THE FIRST ROUND, WHERE WE WILL SAVE THE CHECKPOINT
MCMCTREE in paml version 4.10.6, November 2022

Reading options from mcmctree.ctl..
burnin=0: no automatic step adjustment?

Reading main tree.
...
Reading sequence data..  3 loci
...
Fossil calibration information used.
...
getting initial values to start MCMC.
...
*lnL0 =  -4357.43*

Starting MCMC (np = 12) . . .
paras: 6 times, 3 mu, 3 sigma2 (& rates, kappa, alpha)
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -125.642005 lnL = -3868.964499
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -93.215379 lnL = -3418.465043
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -104.556988 lnL = -2941.897040
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -97.385396 lnL = -2545.685895
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -76.592941 lnL = -2188.420323
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -68.798584 lnL = -1901.384937
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -54.544344 lnL = -1574.924744
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -56.611191 lnL = -1335.055230
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -52.690458 lnL = -1105.441482
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
*lnpR = -51.841389 lnL = -883.199298*

Time used:  0:00
...
Time used:  0:00
ENTERING THE SECOND ROUND, WHERE WE WILL RESUME FROM THE PREVIOUS CHECKPOINT AND KEEP SAVING
MCMCTREE in paml version 4.10.6, November 2022

Reading options from mcmctree.ctl..
burnin=0: no automatic step adjustment?

Reading main tree.
...
Reading sequence data..  3 loci
...
Fossil calibration information used.
...
getting initial values to start MCMC.
...
Reading MCMC state from mcmctree.ckpt...
*lnpR = -51.841389 lnL = -883.199298*
...
Option checkpoint = 2, re-setting to = 1 in order to save the MCMC state for allowing to
resume more than once
...
*lnL0 =   -883.20*

Starting MCMC (np = 12) . . .
paras: 6 times, 3 mu, 3 sigma2 (& rates, kappa, alpha)
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -51.681475 lnL = -713.060750
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -43.083818 lnL = -592.190875
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -37.851375 lnL = -511.581432
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -44.957876 lnL = -423.779445
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -41.394019 lnL = -379.751200
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -42.275751 lnL = -318.159881
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -42.306231 lnL = -288.023876
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -42.683358 lnL = -260.638525
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -39.022285 lnL = -234.226684
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
*lnpR = -41.887699 lnL = -217.925703*

Time used:  0:00
...
Time used:  0:00
ENTERING THE THIRD ROUND, WHERE WE WILL RESUME FROM THE PREVIOUS CHECKPOINT AND KEEP SAVING
MCMCTREE in paml version 4.10.6, November 2022

Reading options from mcmctree.ctl..
burnin=0: no automatic step adjustment?

Reading main tree.
...
Reading sequence data..  3 loci
...
Fossil calibration information used.
...
getting initial values to start MCMC.
...
Reading MCMC state from mcmctree.ckpt...
*lnpR = -41.887699 lnL = -217.925703*

Initial parameters, np = 12.
Sptree & Genetrees read from mcmctree.ckpt.

Option checkpoint = 2, re-setting to = 1 in order to save the MCMC state for allowing to
resume more than once
...
*lnL0 =   -217.93*

Starting MCMC (np = 12) . . .
paras: 6 times, 3 mu, 3 sigma2 (& rates, kappa, alpha)
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -40.031790 lnL = -216.370309
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -36.352054 lnL = -213.187259
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -32.877671 lnL = -194.751782
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -31.885951 lnL = -184.907901
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -34.395480 lnL = -166.199617
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -35.050957 lnL = -148.376788
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -32.609648 lnL = -148.217268
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -30.624549 lnL = -141.916639
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -30.968402 lnL = -131.494447
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
*lnpR = -30.593520 lnL = -130.426863*

Time used:  0:00
...
Time used:  0:00
ENTERING THE FOURTH ROUND, WHERE WE WILL RESUME FROM THE PREVIOUS CHECKPOINT AND KEEP SAVING
MCMCTREE in paml version 4.10.6, November 2022

Reading options from mcmctree.ctl..
burnin=0: no automatic step adjustment?

Reading main tree.
...
Reading sequence data..  3 loci
...
Fossil calibration information used.
...
getting initial values to start MCMC.
...
Reading MCMC state from mcmctree.ckpt...
*lnpR = -30.593520 lnL = -130.426863*

Initial parameters, np = 12.
Sptree & Genetrees read from mcmctree.ckpt.

Option checkpoint = 2, re-setting to = 1 in order to save the MCMC state for allowing to
resume more than once
...
*lnL0 =   -130.43*

Starting MCMC (np = 12) . . .
paras: 6 times, 3 mu, 3 sigma2 (& rates, kappa, alpha)
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -27.668256 lnL = -123.942047
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -23.168179 lnL = -124.236092
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -22.916499 lnL = -123.712129
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -24.445421 lnL = -113.562389
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -23.485015 lnL = -106.842309
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -24.480644 lnL = -106.045684
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -27.741669 lnL = -110.998322
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -22.040830 lnL = -110.407837
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
lnpR = -22.663286 lnL = -97.673694
...
Saving MCMC state to file mcmctree.ckpt...

Using option checkpoint = 1
*lnpR = -21.435755 lnL = -98.899861*

Time used:  0:00
...
Time used:  0:00

Cheers,

Gustavo