bredelings / BAli-Phy

Bayesian co-estimation of phylogenies and multiple alignments via MCMC
http://www.bali-phy.org/
GNU General Public License v2.0
45 stars 16 forks source link

segmentation fault #18

Closed ptitle closed 4 months ago

ptitle commented 5 months ago

For one of my input files, with release 4.0-beta7, I get:

Allocation failed in sample_tri_multi!  Proceeding.
Allocation failed in sample_tri_multi!  Proceeding.
Allocation failed in sample_tri_multi!  Proceeding.
[maybe 30 times]
[1]    25144 segmentation fault  bali-phy APOB_extracted_oneseq_onedirection_All.fasta -S gtr -n  -i 2000

I then tried building from source with the current master branch to see if this was something that had gotten resolved, and that gave me:

bali-phy: Error! evaluating reg # 8295 (unchangeable): case <175495> of {_ -> <175496>}

evaluating reg # 175495 (unchangeable): case <175496> of {_ -> <2404>}

evaluating reg # 175496 (unchangeable): MCMC:runMCMC 2000 0 0

evaluating reg # 471014 (unchangeable): case <86522> of {_ -> <86561>}

evaluating reg # 86522 (unchangeable): case <86561> of {_ -> <2404>}

evaluating reg # 86561 (unchangeable): MCMC:walk_tree_sample_alignments <6084> <46054> 0 0

std::bad_array_new_length

I can provide the input fasta file if that would be helpful. Any thoughts?

Thanks!

bredelings commented 5 months ago

Yeah, can you provide the input file?

ptitle commented 5 months ago

You can download it here.

It's a somewhat slow analysis, probably due to the couple of longer sequences relative to the rest.

bredelings commented 5 months ago

Initial thoughts:

APOB.fsa.fasta.gz

I have attached an alignment of your sequences with fsa. It is a relatively high-quality aligner that is good for data cleaning because it doesn't align non-homologous sequences. I think if you take a look at the fsa alignment in aliview, you will see what the issues are.

More coming.

Thoughts?

ptitle commented 5 months ago

Thanks! I'll have some time next week to look into this -- I'll report back.

bredelings commented 5 months ago

OK, I found and fixed the SEGFAULT. The original data set now runs under the github master version and seems to be giving a reasonable alignment. Obviously its going to run way faster if you remove the long flanking sequences.

ptitle commented 5 months ago

Thanks! I'll give it a try next week and confirm. Definitely seems to make sense to trim out those excessively long sequences.

ptitle commented 4 months ago

After doing some pre-filtering to remove long flanking sequences, and with the current main branch, bali-phy now works without throwing any errors. Thanks!