amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
374 stars 62 forks source link

`ERROR: vector::_M_default_append` when running on CATG file #160

Closed dlaehnemann closed 11 months ago

dlaehnemann commented 1 year ago

The error comes up when running the following command:

raxml-ng --all   --msa results/raxml_ng_input/some_group.catg   --model GTGTR+FO   --prefix results/raxml_ng/some_group   --prob-msa   --threads 16   --tree pars{1}   --log DEBUG 2>logs/raxml_ng/some_group.raxml.error.log

It seems to read in the input file correctly (I also ran raxml-ng --parse on it beforehand, and that went fine), it generates a starting tree and then throws the error ERROR: vector::_M_default_append. Here's the last lines of DEBUG output (starting from the end of input parsing):

CATG: site 49062 consesus seq: NNNNNNNNNNNNNNNNNNNNNNNKNTKNNNNN
CATG: site 49063 consesus seq: NNNNNNNNNNNNNNNNNNNNNNNNNTNNKNNN
[00:00:03] Loaded alignment with 32 taxa and 49064 sites
[00:00:03] Extracting partitions... 
[00:00:03] Checking the alignment...

Alignment comprises 1 partitions and 49064 sites

Partition 0: noname
Model: GT10GTR+FO
Alignment sites: 49064
Gaps: 49.88 %
Invariant sites: 0.00 %

Recommended threads (response/balanced/throughput): 25 / 10 / 9

Parallelization scheme autoconfig: 2 worker(s) x 10 thread(s)

[00:00:03] Generating a RANDOM starting tree, seed: 1150666164
[00:00:03] Generating 1 parsimony starting tree(s) with 32 taxa
Estimated memory per parsimony thread: 7 MB
Parallel parsimony with 20 threads
[00:00:04] [worker #0] Generated a PARSIMONY starting tree, seed: 1622191121, score: 196910
Parallel reduction/worker buffer size: 1 KB  / 0 KB

ERROR: vector::_M_default_append

The only thing that stands out to me is the difference in --threads 16 specified in the command and the statement Parallel parsimony with 20 threads. Other than that, I have no clue how to debug this -- it doesn't seem to be my input, as the parsing goes just fine. Any ideas?

amkozlov commented 1 year ago

hm this inconsistency wrt number of threads look weird, could you please post full log file?

amkozlov commented 1 year ago

oh wait there is a syntax error on the command line, you must specify on or off for the --prob-msa switch:

  --prob-msa     on | off                    use probabilistic alignment (works with CATG and VCF)
dlaehnemann commented 1 year ago

I fixed this, but the error remains the same. Here is a more complete log (but I skip all the reading in of taxa and sites):

RAxML-NG v. 1.2.0 released on 09.05.2023 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth, Julia Haag, Anastasis Togkousidis.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores, 125 GB RAM

RAxML-NG was called at 22-Jun-2023 13:20:31 as follows:

raxml-ng --all --msa results/raxml_ng_input/some_sample.ml_gt_and_likelihoods.catg --model GTGTR+FO --prefix results/raxml_ng/some_sample --prob-msa on --threads 16 --tree pars{1} --log DEBUG

Analysis options:
  run mode: ML tree search + bootstrapping (Felsenstein Bootstrap)
  start tree(s): parsimony (1)
  bootstrap replicates: parsimony (max: 1000) + bootstopping (autoMRE, cutoff: 0.030000)
  random seed: 1687432831
  tip-inner: OFF
  pattern compression: OFF
  per-rate scalers: OFF
  site repeats: OFF
  logLH epsilon: general: 10.000000, brlen-triplet: 1000.000000
  fast spr radius: AUTO
  spr subtree cutoff: 1.000000
  fast CLV updates: ON
  branch lengths: proportional (ML estimate, algorithm: NR-FAST)
  SIMD kernels: AVX
  parallelization: coarse-grained (auto), PTHREADS (16 threads), thread pinning: OFF

RBA partial loading: OFF
|noname|   |GT10GTR+FO|   ||
[00:00:00] Reading alignment from file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
Failed to load as IPHYLIP: Unable to parse PHYLIP file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
 (LIBPLL-233): Sequence 2 (MMMAMMNMNAAAAAAANNNNNNNAANNNNNNN) data out of alignment
Failed to load as PHYLIP: Unable to parse PHYLIP file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
 (LIBPLL-232): Sequence 1 (CATo8) longer than expected
Failed to load as FASTA: Error parsing FASTA file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
 (LIBPLL-203): Illegal header line in query fasta file
Failed to load as FASTA (long labels): Error parsing FASTA file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
 (LIBPLL-203): Illegal header line in query fasta file
CATG: taxa: 32, sites: 49064
CATG: taxon 0: CATo8
[...]
CATG: taxon 31: CAB1
CATG: site 0 consesus seq: MMMAMMNMNAAAAAAANNNNNNNAANNNNNNN
CATG: number of states: 01-Jan-1970 01:00:10
CATG: site 1 consesus seq: MMMMMCNNNMMNMMCMNNNNNNNNMNNMNNMN
CATG: site 2 consesus seq: MMMAMMMAAMNNMMMANNNNNNNNNNNNNAAN
[...]
CATG: site 49063 consesus seq: NNNNNNNNNNNNNNNNNNNNNNNNNTNNKNNN
[00:00:03] Loaded alignment with 32 taxa and 49064 sites
[00:00:03] Extracting partitions... 
[00:00:03] Checking the alignment...

Alignment comprises 1 partitions and 49064 sites

Partition 0: noname
Model: GT10GTR+FO
Alignment sites: 49064
Gaps: 49.88 %
Invariant sites: 0.00 %

Recommended threads (response/balanced/throughput): 25 / 10 / 9

Parallelization scheme autoconfig: 1 worker(s) x 16 thread(s)

[00:00:03] Generating a RANDOM starting tree, seed: 502175453
[00:00:03] Generating 1 parsimony starting tree(s) with 32 taxa
Estimated memory per parsimony thread: 7 MB
Parallel parsimony with 16 threads
[00:00:03] [worker #0] Generated a PARSIMONY starting tree, seed: 1904568126, score: 197219
Parallel reduction/worker buffer size: 1 KB  / 0 KB

ERROR: vector::_M_default_append
amkozlov commented 1 year ago

thanks but I can't reproduce it, could you please send me your input file?

dlaehnemann commented 1 year ago

I'll try to produce a minimal triggering example, as I cannot share the full data publicly (without controlled access). This will hopefully also help narrow down the bug (or data issue) search further.

dlaehnemann commented 1 year ago

It seems like I can't get this to trigger with any reduced version of the data. All of these will run until past the previous error point:

So it doesn't seem to be a particular line that is triggering this, but rather something like the number and size of records.

In addition, I found out that changing the command from --tree pars{1} to --tree pars{2} causes raxml-ng to fail even earlier, with:

CATG: site 49063 consesus seq: NNNNNNNNNNNNNNNNNNNNNNNNNTNNKNNN
[00:00:49] Loaded alignment with 32 taxa and 49064 sites
[00:00:49] Extracting partitions... 
[00:00:49] Checking the alignment...

Alignment comprises 1 partitions and 49064 sites

Partition 0: noname
Model: GT10GTR+FO
Alignment sites: 49064
Gaps: 49.88 %
Invariant sites: 0.00 %

Recommended threads (response/balanced/throughput): 25 / 10 / 9

Parallelization scheme autoconfig: 1 worker(s) x 1 thread(s)

[00:00:49] Generating a RANDOM starting tree, seed: 661317336
raxml-ng: /opt/conda/conda-bld/raxml-ng_1686044823122/work/src/main.cpp:1258: void load_start_trees(RaxmlInstance&): Assertion `i == instance.opts.num_searches' failed.

This at least has a minimal backtrace that points to some raxml-ng code, so maybe this helps you understand what's going on? Interestingly, it states Generating a RANDOM starting tree here (and in the original error), even though only --tree pars{1|2} was requested.

And finally, here's at least one example record so you roughly know what my data looks like (I altered some taxon entries, but the general format should be clear):

RRRRRRRRRRRRARAANNNNRANARNNRRNAN    7.516919868066907e-05,0.0,0.0006961930193938315,0.0,0.0,0.9992288922614705,0.0,0.0,0.0,0.0  0.07238359749317169,0.0,2.6162499125348404e-05,0.0,0.0,0.9275895592581946,0.0,0.0,0.0,0.0   5.657959991367534e-05,0.0,0.0005624190089292824,0.0,0.0,0.9993802037460426,0.0,0.0,0.0,0.0  1.1723000170604791e-05,0.0,0.000673658971209079,0.0,0.0,0.9993150746452208,0.0,0.0,0.0,0.0  0.0005221319734118879,0.0,0.00044177399831824005,0.0,0.0,0.9990369205688694,0.0,0.0,0.0,0.0 0.10272099822759628,0.0,5.522049832507037e-06,0.0,0.0,0.897273358918028,0.0,0.0,0.0,0.0 0.2248540073633194,0.0,1.1224799891351722e-05,0.0,0.0,0.7751347868470475,0.0,0.0,0.0,0.0    1.0774800102808513e-05,0.0,0.0007209079922176898,0.0,0.0,0.9992684416459561,0.0,0.0,0.0,0.0 6.421249736376922e-07,0.0,0.0007396049913950264,0.0,0.0,0.9992593830213183,0.0,0.0,0.0,0.0  0.013520999811589718,0.0,0.00010749499779194593,0.0,0.0,0.9863714732455264,0.0,0.0,0.0,0.0  1.5063299940720754e-07,0.0,0.0007437890162691474,0.0,0.0,0.9992558822072304,0.0,0.0,0.0,0.0 0.09034000337123871,0.0,5.719130058423616e-06,0.0,0.0,0.9096553092240356,0.0,0.0,0.0,0.0    0.5612149834632874,0.0,5.508579761226429e-07,0.0,0.0,0.4387846173485741,0.0,0.0,0.0,0.0 0.004178300034254789,0.0,0.0002092140057357028,0.0,0.0,0.9956126061897521,0.0,0.0,0.0,0.0   0.5382919907569885,0.0,5.978749868518207e-07,0.0,0.0,0.46170735149644315,0.0,0.0,0.0,0.0   0.5382919907569885,0.0,5.978749868518207e-07,0.0,0.0,0.46170735149644315,0.0,0.0,0.0,0.0 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 1.132950001192512e-05,0.0,0.0006479779840447009,0.0,0.0,0.9993400698736252,0.0,0.0,0.0,0.0  0.9833009839057922,0.0,9.828800273670169e-11,0.0,0.0,0.016699400605276082,0.0,0.0,0.0,0.0   0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 0.78541499376297,0.0,1.6060899952208274e-06,0.0,0.0,0.2145843373145908,0.0,0.0,0.0,0.0  3.318259871321061e-07,0.0,0.0007419249741360545,0.0,0.0,0.9992573255024535,0.0,0.0,0.0,0.0  0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 0.0001326689962297678,0.0,0.0006849199999123812,0.0,0.0,0.9991819075194144,0.0,0.0,0.0,0.0  2.0807999590033432e-07,0.0,0.0007430710247717798,0.0,0.0,0.9992566032654031,0.0,0.0,0.0,0.0   0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 0.9050719738006592,0.0,1.0040400155730822e-07,0.0,0.0,0.09492797502025496,0.0,0.0,0.0,0.0 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1

Other than that, I am out of debugging ideas for now. Could I compile raxml-ng with some option to get a better backtrace or even more debugging infos?

dlaehnemann commented 1 year ago

For now, I'll probably side-step the issue by changing my input filtering. But I'm keeping the erroring input file around, in case you have further debugging ideas.

amkozlov commented 1 year ago

Thanks for your debugging efforts & detailed report!

So it seems that the original error does only occur under very specific and rare circumstances, which is generally good news :) Still, the easiest way to debug would be if you could send me your (anonymized) input file, you can e,g. obscure the taxon names.

The second error with --tree pars{2} is a "known bug": raxml-ng just loaded the checkpoint file from the old run with --tree pars{1}, and then noticed that the number of starting trees does not match. So the problem can be fixed by simply adding --redo, although the error message should definitely be improved.

dlaehnemann commented 11 months ago

Sorry for the late follow-up, and thanks for pointing me to the checkpointing as a potential problem. Even with different filtering, the original error persistet.

But it appears that a remaining checkpointing file from some previous run was causing the original error, as well. Removing the checkpointing file allowed the command to run through.

Maybe the picking up from a previous checkpoint should not be the default behaviour? Or raxml-ng should at least warn about detecting one? Or check for consistency of the parameters from the original run and the current one (or does it do that already)? But this is just me thinking out loud, here...