bredelings / BAli-Phy

Bayesian co-estimation of phylogenies and multiple alignments via MCMC
http://www.bali-phy.org/
GNU General Public License v2.0
45 stars 16 forks source link

bp-analyze (logging.debug) #11

Closed roussine closed 3 years ago

roussine commented 4 years ago

Hello - is this known? BAli-Phy running fine on test with [--iter=150] and finishes, but post-analysis with bp-analyze [../bp-analyze dir-1 dir-2] produces this:

~/../bali-phy-3.5.0/bin/bp-analyze", line 405 logging.debug(f"LogFileRun: {mcmc_output}") ^

System is BioLinux 8 Ubuntu 14.04 LTS. BAli-Phy version is latest (3.5, BUILD: Apr 20 2020 14:13:27, ARCH: linux x86_64) Same result appears with no arguments fed in to bp-analyze. Reasonable fiddling doesn't help.. Any comment much appreciated.

Leo

bredelings commented 4 years ago

I'm not seeing that same problem here. What version of python do you have? It looks like python 3.6 is the first version that supports f-strings, so that could be the problem. (note the f"....".

Note that ubuntu 14.04 is 6 years old, which is a long time in the linux world. It is kind of like using DNA sequencing technology from from 6 years ago. You should probably install a new version -- like 20.04.

roussine commented 4 years ago

Thank you for commenting. Yes, the system is quite old but upgrading Ubuntu ruins some needed stuff in BioLinux. This makes it easier to switch python for the moment at that VM. System python is 2.7x in this Ubuntu, will try a newer-version pyenv. Will update the post.

roussine commented 4 years ago

Update: the bp-analyze wrapper runs fine with the new python3 (3.6.10). Quick notice: BAli-Phy refuses to consider data with ~100 taxa and 2500 chars\line (standard 28S rRNA dimensions fasta) without complaining off a message. Is it assumed that the complexity is too high? If so, is there a way to convince the program otherwise, apart from chopping the data in pieces. Thanks.

bredelings commented 4 years ago

What is the message?

roussine commented 4 years ago

No, it doesn't produce any message, just drops off to the terminal after executing. No instance is running in the background.

bredelings commented 4 years ago

This data set looks too big to me. Perhaps the system ran out of memory?

Primarily the sequences are probably too long to run right now -- memory and speed usage follows the square of sequence length. I am working on alignment constraints that might make data sets like this possible in the future. You might be able to make this work by dividing the gene into multiple partitions.

roussine commented 4 years ago

Clear, thank you. Physical memory is not limiting in this case. Last point: is it in plans to parallelize individual chains? Like it is implemented (somehow) in some Bayesian approaches (e.g. PhyloBayes MPI). Would be great to have a good align-tree sampler applicable to larger data. Thank you.

bredelings commented 4 years ago

Hmm... if physical memory is not limitting, then the program should not crash, especially without a message. If you can send me the data set, I can try to reproduce this and fix it. The data set is still probably too large to use, but it should not crash.

For parallelization, the alignment step is not very parallelizable, but it might be possible to follow phylobayes and do SPR moves while considering different attachment points on different nodes. So I am not sure yet.