amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
376 stars 62 forks source link

Per-rate scaling notification #133

Closed boopsboops closed 2 years ago

boopsboops commented 2 years ago

I'm trying to speed up raxml-ng by disabling the automatic per-rate scaling (it worked fine on old raxml which I think did not implement this feature?).

But with --rate-scalers off --force I get the following note:

NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments. NOTE: You can use --force switch to skip this check and fall back to per-site scalers.

I'm assuming that it was disabled, but can't be sure, given this message?

Also, is it essential to have to use --force when combined with --rate-scalers off, as I would rather not if possible?

Full log below.

Cheers!

RAxML-NG v. 1.1-master released on 29.11.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz, 4 cores, 15 GB RAM

RAxML-NG was called at 15-Jan-2022 16:37:02 as follows:

raxml-ng --search --msa ali.rba --tree pars{1} --rate-scalers off --force --seed 42 --redo --threads auto

Analysis options:
  run mode: ML tree search
  start tree(s): parsimony (1)
  random seed: 42
  tip-inner: OFF
  pattern compression: ON
  per-rate scalers: OFF
  site repeats: ON
  fast spr radius: AUTO
  spr subtree cutoff: 1.000000
  branch lengths: proportional (ML estimate, algorithm: NR-FAST)
  SIMD kernels: AVX
  parallelization: coarse-grained (auto), PTHREADS (auto)

WARNING: Running in REDO mode: existing checkpoints are ignored, and all result files will be overwritten!

WARNING: Running in FORCE mode: all safety checks are disabled!

[00:00:00] Loading binary alignment from file: ali.rba
[00:00:00] Alignment comprises 5473 taxa, 1 partitions and 335 patterns

Partition 0: noname
Model: TN93+FC+G4m
Alignment sites / patterns: 338 / 335
Gaps: 10.40 %
Invariant sites: 6.21 %

Parallelization scheme autoconfig: 1 worker(s) x 4 thread(s)

NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.
NOTE: You can use --force switch to skip this check and fall back to per-site scalers.

Parallel reduction/worker buffer size: 1 KB  / 0 KB

[00:00:00] Generating 1 parsimony starting tree(s) with 5473 taxa
[00:00:09] Data distribution: max. partitions/sites/weight per thread: 1 / 84 / 1344
[00:00:09] Data distribution: max. searches per worker: 1

Starting ML tree search with 1 distinct starting trees
amkozlov commented 2 years ago

I don't think that rate scalers are the main problem here. Rather, you have a very low sites-to-taxa ratio (<0.1), which means the signal in the MSA is insufficient to resolve most branches. Hence, the extensive and time-consuming ML search as implemented and in raxml-ng (or old RAxML) is pretty pointless on this dataset.

We have discussed this topic multiple times on our RAxML google group, please search for keywords like "poor signal":

https://groups.google.com/g/raxml/search?q=poor%20signal

In short, your options are: 1) subsampling / clustering 2) use FastTree or parsimony 3) increase convergence epsilon, eg --lh-epsilon 10

boopsboops commented 2 years ago

Many apologies for confusing matters by mentioning why I was disabling the rate scalers, but thanks for the tips anyway!

The main purpose of the report was simply to highlight that I had turned the scalers off, but the software reports that they had been enabled regardless. This was a bit misleading, so I was concerned that it may be a bug.

amkozlov commented 2 years ago

ok I see, thanks for reporting!

I guess you are right and it's currently not possible to disable rate scalers for alignments with >2000 taxa. I will take care of this.

amkozlov commented 2 years ago

ok this should be fixed now, you can disable automatic rate scalers with --force model_rate_scalers