amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
379 stars 64 forks source link

terminate called after throwing an instance of 'std::runtime_error' #71

Open jemunro opened 5 years ago

jemunro commented 5 years ago

Issue description: raxml-ng fails on certain input for an unknown reason. The time until failure seems to depend on the random seed used. Jobs have been run on an HPC with 8 cpus and 32 GB ram requested.

stdout:


RAxML-NG v. 0.9.0 released on 20.05.2019 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

RAxML-NG was called at 28-Jun-2019 07:11:01 as follows:

raxml-ng --tree rand{1} --prefix out-6 --threads 8 --seed 17012 --msa out.raxml.rba

Analysis options:
  run mode: ML tree search
  start tree(s): random (1)
  random seed: 17012
  tip-inner: OFF
  pattern compression: ON
  per-rate scalers: OFF
  site repeats: ON
  fast spr radius: AUTO
  spr subtree cutoff: 1.000000
  branch lengths: proportional (ML estimate, algorithm: NR-FAST)
  SIMD kernels: AVX2
  parallelization: PTHREADS (8 threads), thread pinning: OFF

[00:00:00] Loading binary alignment from file: out.raxml.rba
[00:00:03] Alignment comprises 11794 taxa, 3 partitions and 13372 patterns

Partition 0: essential_gene
Model: GTR+FO+G4m+B+ASC_STAM{141113/268966/270890/142211}
Alignment sites / patterns: 15523 / 13041
Gaps: 0.01 %
Invariant sites: 0.00 %

Partition 1: rRNA_gene
Model: GTR+FO+G4m+B+ASC_STAM{1069/1068/1495/858}
Alignment sites / patterns: 300 / 299
Gaps: 0.14 %
Invariant sites: 0.00 %

Partition 2: tRNA_gene
Model: GTR+FO+G4m+B+ASC_STAM{628/1017/1066/642}
Alignment sites / patterns: 32 / 32
Gaps: 0.01 %
Invariant sites: 0.00 %

NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.
NOTE: You can use --force switch to skip this check and fall back to per-site scalers.

[00:00:03] Generating 1 random starting tree(s) with 11794 taxa
[00:00:03] Data distribution: max. partitions/sites/weight per thread: 2 / 1672 / 26752

Starting ML tree search with 1 distinct starting trees

[00:00:07 -52498962.024023] Initial branch length optimization
[00:02:08 -7689180.308917] Model parameter optimization (eps = 10.000000)

stderr:

terminate called recursively
terminate called recursively
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
amkozlov commented 5 years ago

this looks very similar to the issue recently reported on the raxml google group here:

https://groups.google.com/forum/#!topic/raxml/gluJCb9PGvA

Could you please try to reproduce with the latest dev and/or --force model_lh_impr as suggested in the above thread?

jemunro commented 5 years ago

Thanks for the suggestion.

I've tried the following:

  1. latest dev
  2. latest dev + --force model_lh_impr
  3. latest dev + --force model_lh_impr --precision 12 --blmin 0.000000001

However, the issue persists in all of these cases.

jemunro commented 5 years ago

Just an update, the issue seems to be limited to the +ASC_STAM model parameter. The same input with either +ASC_LEWIS or no ascertainment bias correction has no issue.

amkozlov commented 5 years ago

Thanks for checking!

Can I get your alignment to reproduce the error?

jemunro commented 5 years ago

Sure, here is a trimmed down version of the original that gives the same error: https://drive.google.com/open?id=1_QWid_PSvbkkDucibHVVOHbmDdcjBNhd

amkozlov commented 5 years ago

thanks, I'll have a look!

just a side note: if you do have the full MSA, it is preferable to use it instead of +ASC_STAM model.

jemunro commented 5 years ago

I wasn't aware of that. Can you briefly outline what the advantage of using the full MSA is over using ascertainment bias correction? Would a similar result be achieved by including variant sites as trinucleotides instead? Thanks you.

amkozlov commented 5 years ago

The main shortcoming of ascertainment bias correction is that it ignores the missing data/indels in the original MSA. Please see detailed discussion here: https://academic.oup.com/sysbio/article/64/6/1032/1669226 Thus, I think that using trinucleotides won't help much.