chris-n-self / ptmpi

Python class to implement parallel-tempering fully in parallel using MPI
MIT License
6 stars 2 forks source link

Adapting ptmpi for phylogenetics #1

Open PhyloStar opened 5 years ago

PhyloStar commented 5 years ago

Hi, I am Taraka a post-doc working with linguistic phylogenetics. I implemented my MH sampler for sampling trees and other model parameters.

My code which runs mcmc process is here: https://github.com/PhyloStar/CyBayes/blob/master/mat_mcmc_gamma.py

I write a state to file every t iterations. I am wondering how your program handles this. Does your class access the mcmc state in the child process and then perform swaps? I do not have much experience with mpi with python. I did mpi programming ages ago and don't remember much of the library. I need to run 4 chains (1 cold, 3 hot).

-Taraka

chris-n-self commented 5 years ago

Hi Taraka,

Great to hear your interest!

Does your class access the mcmc state in the child process and then perform swaps?

No, I wrote 'ptmpi' so that the mpi code does not directly interact with the mcmc state. Each process has an instance of the ptmpi class. Those objects communicate with each other to tell their mcmc chain what temperature it should have at that point in the program. Apart from adding the definitions and pt_step method calls into your code you shouldn't have to change anything.

It's harder to visualise, but it's a parallel-tempering implementation where the temperature indexes are permuted over static copies of the mcmc chains rather than the other way around. Each process looks like an independent mcmc chain whose temperature changes at 'swap steps' as the program progresses. I did it this way around because I wanted to avoid sending the whole state of the mcmc chains between the parallel processes.

If you have any other questions, or you've tried to use the class and it doesn't seem to work, let me know!

Best, Chris

PhyloStar commented 5 years ago

Hi Chris, Thanks a lot for the reply. Yes. Performing temperature swaps is cheap. Posterior probability is required to perform a swap right? How does your code access that in my code? Do you have any example? Sorry, I did not read your code in detail. Thanks again. Best, Taraka

chris-n-self commented 5 years ago

No problem! I realised after your message that I changed a few things in the code and didn't update the readme, so I will update that and add a simple fully fleshed out example of using the code.

At the moment, the decision probability is the one given on here: https://en.wikipedia.org/wiki/Parallel_tempering (the first expression, not the one after the final equals). So at the swap step each mcmc chain passes their energy and their current and proposed temperatures to the ptmpi object. The ptmpi object then handles the communication between the two relevant processes to decide if a swap should occur or not. Is this general enough for you, or in your case do the swap probabilities need to something more complex?

PhyloStar commented 5 years ago

Okay. Having an example is very useful.

The swap ratio is given by the following formula for Bayesian phylogenetics. https://en.wikipedia.org/wiki/Bayesian_inference_in_phylogeny#Metropolis-coupled_MCMC I think this is similar to the formula in parallel tempering wiki article.

chris-n-self commented 5 years ago

Hey, sorry for the delay! We've been finishing up some projects and writing them up and it's been hectic. I have updated the readme and added a complete example with an annotated break down of the example script to show how it is working.

The parallel tempering probability you are interested in looks very similar to the regular swap probability. You could hack it slightly to use those probabilities by passing these arguments to pt_step: 1, -log(\pi_i(\theta_i)), -log(\pi_i(\theta_j)). The first argument is E_i, the second is \beta_i = 1/T_i, and the third is \beta_j = 1/T_j.