amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
393 stars 64 forks source link

treeinfo_compute_loglh: Assertion `total_loglh < 0.' failed #93

Closed davipatti closed 4 years ago

davipatti commented 4 years ago

Hello, I've been running into this error:

david@puck~/d/c/P/b/a/r/debug> raxml-ng --search1 --msa ali.fasta --model FLU+G --seed 42 --log VERBOSE

RAxML-NG v. 0.9.0git released on 26.11.2019 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

RAxML-NG was called at 02-Jun-2020 09:30:23 as follows:

raxml-ng --search1 --msa ali.fasta --model FLU+G --seed 42 --log VERBOSE

Analysis options:
  run mode: ML tree search
  start tree(s): random (1)
  random seed: 42
  tip-inner: OFF
  pattern compression: ON
  per-rate scalers: OFF
  site repeats: ON
  fast spr radius: AUTO
  spr subtree cutoff: 1.000000
  branch lengths: proportional (ML estimate, algorithm: NR-FAST)
  SIMD kernels: AVX2
  parallelization: PTHREADS (6 threads), thread pinning: OFF

[00:00:00] Reading alignment from file: ali.fasta
[00:00:00] Loaded alignment with 4000 taxa and 570 sites
[00:00:00] Extracting partitions... 
[00:00:00] Checking the alignment...
[00:00:00] Compressing alignment patterns... 

Alignment comprises 1 partitions and 569 patterns

Partition 0: noname
Model: FLU+G4m
Alignment sites / patterns: 570 / 569
Gaps: 19.26 %
Invariant sites: 7.19 %

NOTE: Binary MSA file created: ali.fasta.raxml.rba

NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.
NOTE: You can use --force switch to skip this check and fall back to per-site scalers.

[00:00:00] Generating 1 random starting tree(s) with 4000 taxa

Initial model parameters:
   Partition: noname
   Rate heterogeneity: GAMMA (4 cats, mean),  alpha: 1.000000 (ML),  weights&rates: (0.250000,0.136954) (0.250000,0.476752) (0.250000,1.000000) (0.250000,2.386294) 
   Base frequencies (model): 0.047072 0.050910 0.074214 0.047860 0.025022 0.033304 0.054587 0.076373 0.019964 0.067134 0.071498 0.056785 0.018151 0.030496 0.050656 0.088409 0.074339 0.018524 0.031474 0.063229 
   Substitution rates (model): 0.138659 0.053367 0.584852 0.026447 0.353754 1.484235 1.132313 0.214758 0.149927 0.023117 0.474334 0.058745 0.080491 0.659311 3.011345 5.418298 0.196000 0.018289 3.532005 0.161001 0.006772 0.167207 3.292717 0.124898 1.190624 1.879570 0.246117 0.296046 15.300097 0.890162 0.016100 0.154027 0.950138 0.183077 1.369429 0.099855 0.103964 7.737393 0.000013 0.530643 0.061652 0.322525 1.387096 0.218572 0.000836 2.646848 0.005252 0.000836 0.036400 3.881311 2.140332 0.000536 0.373102 0.010258 0.014100 0.145469 5.370511 1.934833 0.887571 0.014086 0.005731 0.290043 0.041763 0.000001 0.188539 0.338372 0.135481 0.000015 0.525399 0.297124 0.002547 0.000000 0.116941 0.021800 0.001112 0.005614 0.000004 0.111457 0.104054 0.000000 0.336263 0.011975 0.094107 0.601692 0.054905 1.195629 0.108051 5.330313 0.028840 1.020367 2.559587 0.190259 0.032681 0.712770 0.487822 0.602341 0.044000 0.072206 0.406698 1.593099 0.256492 0.014200 0.016500 3.881489 0.313974 0.001004 0.319559 0.307140 0.280125 0.155245 0.104093 0.285048 0.058775 0.000016 0.006516 0.264149 0.001500 0.001237 0.038632 1.585647 0.018808 0.196486 0.074815 0.337230 0.243190 0.321612 0.347303 0.001274 0.119029 0.924467 0.580704 0.368714 0.022400 6.448954 0.098631 3.512072 0.227708 9.017954 1.463357 0.080543 0.290381 2.904052 0.032132 0.273934 14.394052 0.129224 6.746936 2.986800 0.634309 0.570767 0.044926 0.431278 0.340058 0.890599 1.331292 0.320000 0.195751 0.283808 1.526964 0.000050 0.012416 0.073128 0.279911 0.056900 0.007027 2.031511 0.070460 0.874272 4.904842 0.007132 0.996686 0.000135 0.814753 5.393924 0.592588 2.087385 0.542251 0.000431 0.000182 0.058972 2.206860 0.099836 0.392552 0.088256 0.207066 0.124898 0.654109 0.427755 0.256900 0.167582 

[00:00:00] Data distribution: max. partitions/sites/weight per thread: 1 / 95 / 7600

thread# part#   start   length  weight
0   0   475 94  7520

1   0   380 95  7600

2   0   285 95  7600

3   0   190 95  7600

4   0   95  95  7600

5   0   0   95  7600

Starting ML tree search with 1 distinct starting trees

[00:00:00 -352943.122177] Initial branch length optimization
[00:00:16 -278338.347051] Model parameter optimization (eps = 10.000000)
raxml-ng: ~/Downloads/raxml-ng/libs/pll-modules/src/tree/treeinfo.c:1073: treeinfo_compute_loglh: Assertion `total_loglh < 0.' failed.
raxml-ng: ~/Downloads/raxml-ng/libs/pll-modules/src/tree/treeinfo.c:1073: treeinfo_compute_loglh: Assertion `total_loglh < 0.' failed.
fish: “raxml-ng --search1 --msa ali.fa…” terminated by signal SIGABRT (Abort)

I have investigated a bit:

Any help would be great, thanks

amkozlov commented 4 years ago

Hi @davipatti,

that's an interesting one :)

Apparently rounding some very small substitution rates to zero leads to numerical problems on this particular dataset.

Could you please try to re-run with original FLU model (attached, also available from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU), using the following command:

raxml-ng --search1 --msa ali.fasta --model PROTGTR{FLU.txt}+G --seed 42 

This seems to fix the problem for me, if you can confirm, I will update the rates in the built-in FLU model.

FLU.txt

davipatti commented 4 years ago

I've tried --model PROTGTR{FLU.txt}+G and that runs fine. Thanks!

amkozlov commented 4 years ago

Thanks for the confirmation! This has been fixed by 12f68c5a1f9337f2255dc4e1a13be0c5a98adb40