avaughn271 / AdmixtureBayes

GNU General Public License v3.0
17 stars 1 forks source link

Options to speed up MCMC analysis? #3

Open mdondrup opened 1 day ago

mdondrup commented 1 day ago

I am running admixtureBayes on a set of ~400 yeast strains with 20k SNPs out of which 6k are independent. So far I have benchmarked only up to 1000 iterations (-n 1000) and 16 chains which took 9 hours to complete. The machine has 144 threads.

If I wanted to increase to 10k or even 1M iterations, if it scales linearly, it would take a lot of time. Also, to check convergence, if I understand correctly, I would need to run the analysis 3 times with identical parameters. Another group has even used up to 1.5M iterations. What would be the best strategy to speed up the analysis? Reduce number of samples, SNPs, or speed up IO, or use GPUs? CPU-time scaling factor was 120X so I think the existing cores were already well utilized. Any suggestions would be very welcome.

avaughn271 commented 23 hours ago

Hi Michael,

Good to hear from you. You mentioned you are using about ~400 yeast strains. How many populations are these strains divided into? Is each strain a separate population in the input file? I wouldn't expect AdmixtureBayes to work well for more than, say, 20 different populations. If there is a way to combine populations in a reasonable way (or simply remove certain populations) to reduce the number of populations to a smaller number, it will probably greatly improve the speed of the analysis. The number of snps is very reasonable, so that's likely not the problem. Hope this helps.

-Andrew