amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
376 stars 62 forks source link

Segmentation fault with parsing of alignment at 'Compressing aligmnent patterns...' #122

Closed jgolob closed 1 year ago

jgolob commented 2 years ago

Greetings! Thank you for the well-supported re-implementation of raxml.

When attempting to parse an alignment, I am consistently running into a segmentation fault with V1.0.2. This has reproduced with the same alignment on multiple systems / setups. --check is able to complete successfully.

The log (with --log debug):

RAxML-NG v. 1.0.2 released on 22.02.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz, 6 cores, 31 GB RAM

RAxML-NG was called at 22-Jul-2021 18:15:07 as follows:

raxml-ng --parse --msa ra.fasta --seed 12345 --model GTR+G --log debug

Analysis options:
  run mode: Alignment parsing and compression
  start tree(s): 
  random seed: 12345
  tip-inner: OFF
  pattern compression: ON
  per-rate scalers: OFF
  site repeats: ON
  branch lengths: proportional (ML estimate, algorithm: NR-FAST)
  SIMD kernels: AVX2
  parallelization: coarse-grained (auto), PTHREADS (auto)

RBA partial loading: OFF
|noname|   |GTR+FO+G4m|   ||
[00:00:00] Reading alignment from file: ra.fasta
Failed to load as IPHYLIP: Unable to parse PHYLIP file: ra.fasta
 (LIBPLL-231): Invalid number of sequences in header
Failed to load as PHYLIP: Unable to parse PHYLIP file: ra.fasta
 (LIBPLL-231): Invalid number of sequences in header
[00:00:01] Loaded alignment with 46991 taxa and 12777 sites
[00:00:01] Extracting partitions... 
[00:00:01] Checking the alignment...

WARNING: Fully undetermined columns found: 60

NOTE: Reduced alignment (with duplicates and gap-only sites/taxa removed) 
NOTE: was saved to: /working/ra.fasta.raxml.reduced.phy
[00:00:47] Compressing alignment patterns... 
Segmentation fault

The alignment is gigantic (~ 500mb) making it a bit impractical to share directly.

amkozlov commented 2 years ago

Hello, thank you for reporting! But i'm afraid it will be difficult to diagnose this problem without having the MSA file. Could you please share it via GDrive etc.?

On a different note: most likely, it will make little sense to run raxml-ng on a dataset with such dimensions (#taxa > #sites) due to insufficient signal. We have discussed this issue quite a lot on the raxml google group, see e.g. these threads: https://groups.google.com/g/raxml/c/13MxNQvve-Y/m/4rLAdj3MAAAJ https://groups.google.com/g/raxml/c/upsTachi-Nc/m/etOL1bW0BAAJ

amkozlov commented 1 year ago

If the issue still exists with the latest raxml-ng version, please reopen and provide input files.