amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
374 stars 62 forks source link

Raxml-ng behavior regarding alignment sites #185

Closed aghozlane closed 1 week ago

aghozlane commented 1 week ago

Dear developers,

I use raxml-ng in a workflow and I have issues with the behavior of raxml-ng with cpus. I am working with sets of incomplete metagenomic species and I want to run smoothly raxml-ng for any dataset. I tried to count aligned sites, remove all the position with >50% gaps. However Raxml-ng performs additionnal filtering of the aligned sites :

WARNING: Fully undetermined sequences found: 3 WARNING: Sequences ERS4538872 and ERS4539095 are exactly identical! WARNING: Sequences ERS4538872 and ERS4539245 are exactly identical! WARNING: Sequences ERS4538872 and ERS4538529 are exactly identical! WARNING: Duplicate sequences found: 3 NOTE: Reduced alignment (with duplicates and gap-only sites/taxa removed) NOTE: was saved to: /projects/benchmark_meteor/analysis/benchmark_strain/meteor_optimization/human/tree/m80_c0.5_p20/msp_0225.raxml.reduced.phy Alignment comprises 1 partitions and 4 patterns Partition 0: noname Model: GTR+FO+G4m Alignment sites / patterns: 14 / 4 Gaps: 42.86 % Invariant sites: 100.00 % NOTE: Binary MSA file created: /projects/benchmark_meteor/analysis/benchmark_strain/meteor_optimization/human/tree/m80_c0.5_p20/msp_0225.raxml.rba Parallelization scheme autoconfig: 1 worker(s) x 8 thread(s) [00:00:00] Generating 1 parsimony starting tree(s) with 7 taxa Parallel parsimony with 8 threads Parallel reduction/worker buffer size: 1 KB / 0 KB ERROR: There are fewer alignment sites (4) than processes (8)!

Can I disable these additionnal filtering ? or should I avoid to remove sites with too much gaps before raxml-ng ? Is it possible that raxml-ng run with any set of process and just use what it needs ?

amkozlov commented 1 week ago
  1. You can disable MSA checks with the --force option, please see here for details: https://github.com/amkozlov/raxml-ng/wiki/Disabling-safety-checks

  2. You can tell raxml-ng to use as many threads as needed with --threads auto or --threads auto{8}, please see here: https://github.com/amkozlov/raxml-ng/wiki/Parallelization#adaptive-parallelization

  3. If you end up with just 4 patterns in the MSA, it is almost certainly not what you want. Probably you should either disable/adjust MSA filtering, or just skip this locus since it does not have enough signal for reliable tree inference.