amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
383 stars 64 forks source link

Assertion `cur_loglh - new_loglh < -new_loglh * 1e-14' failed. #32

Closed smsaladi closed 6 years ago

smsaladi commented 7 years ago

I'm trying to run raxml-ng on comet for a dataset that I seem to see odd behavior when run with the standard raxml.

With both my trimmed and non-trimmed alignment, at some stage in the calculation, the job hits the following error (full output from one of the two jobs is also attached): raxml-ng-mpi: /home/saladi1/installers/raxml-ng/src/Optimizer.cpp:33: double Optimizer::optimize_model(TreeInfo&, double): Assertion `cur_loglh - new_loglh < -new_loglh * 1e-14' failed. slurm_raxml-ng.10546635.comet-10-49.out.txt

Do you have any suggestions of how I might get around this issue? Apologies if this is a trivial error/mistake on my part.

amkozlov commented 7 years ago

Hi @smsaladi, this looks like a numerical issue, I'm trying to reproduce/debug it now.

One thing I can definitely suggest is trying a different rate heterogeneity model: you use FreeRate model with 50 rate categories (LG+R50) which is a lot, especially for such a small alignment. So could you maybe try with LG+R5 or LG+R10?

smsaladi commented 7 years ago

Thank you! That makes sense. I will re-try the calculation with a fewer number of rate categories. As a point of understanding, in practice, how does one determine the right number of rate categories? Is there a rule of thumb based on the size of an alignment? Feel free to point me back at literature/textbook since this could be a simple point.

amkozlov commented 7 years ago

The problem here is that by increasing the number of FreeRate categories, you're increasing the number of free model parameters and hence the risk of overfitting, esp. on small datasets (not to mention the growing computational overhead). So ideally, you should apply formal model testing to choose both substitution matrix and rate heterogeneity model (e.g. with ModelFinder, see http://dx.doi.org/10.1038/nmeth.4285).

As a quick&dirty alternative, I'd just look at the optimized rates printed by raxml: e.g., if with +R6 you already see that some rates are very close and/or the corresponding weights are very low, then it's a clear indication that it wouldn't make sense to increase the number of cats even further.

smsaladi commented 7 years ago

Thank you! That's very helpful!

amkozlov commented 6 years ago

closing since this bug cannot be reproduced anymore with the latest version (0.6.0.dev)

esteinig commented 5 years ago

Apologies to reopen the issue, but the same error occurs occasionally on the GTR+G+ASC_LEWIS model - it is inconsistent, that is for a large number of manual bootstrap alignments it works, but fails occasionally with the following output:

raxml-ng: /home/alex/projects/hits/raxml-ng/src/TreeInfo.cpp:270: double TreeInfo::optimize_params(int, double): Assertion `cur_loglh - new_loglh < -new_loglh * RAXML_DOUBLE_TOLERANCE' failed.
(core dumped) raxml-ng --msa boot_66.fa --model GTR+G+ASC_LEWIS --threads 2
amkozlov commented 5 years ago

@esteinig thanks, could you please send me boot_66.fa?

esteinig commented 5 years ago

@amkozlov thank you for taking the time to look into this. is there an email i could send the file to? my apologies for this, the data is not published yet.

amkozlov commented 5 years ago

@esteinig I got the alignment file but cannot reproduce the error so far. could you please send me full raxml-ng log file?

maernster commented 4 years ago

Hi,

sorry for reopening this issue but I got a similar error message using RAxML-NG/0.9.0-mpi. I am running raxml-ng on 103 FASTA alignments using following command: mpirun -n 1 raxml-ng-mpi --msa-format FASTA --search --msa alignment.fasta --model GTGTR --prefix out_name -threads 1

The tree inference seems to work in most cases. However, there are 7 files where I get the same error message and I fail to understand what is going wrong:

raxml-ng-mpi: /sw/bioinfo/RAxML-NG/0.9.0-mpi/src/src/TreeInfo.cpp:275: double TreeInfo::optimize_params(int, double): 'Assertion cur_loglh - new_loglh < -new_loglh * 1e-14' failed. [r40:05056] Process received signal [r40:05056] Signal: Aborted (6) [r40:05056] Signal code: (-6) [r40:05056] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2abf6f92a630] [r40:05056] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2abf6fb6d387] [r40:05056] [ 2] /lib64/libc.so.6(abort+0x148)[0x2abf6fb6ea78] [r40:05056] [ 3] /lib64/libc.so.6(+0x2f1a6)[0x2abf6fb661a6] [r40:05056] [ 4] /lib64/libc.so.6(+0x2f252)[0x2abf6fb66252] [r40:05056] [ 5] raxml-ng-mpi(_ZN8TreeInfo15optimize_paramsEid+0x88f)[0x467ddf] [r40:05056] [ 6] raxml-ng-mpi(_ZN9Optimizer17optimize_topologyER8TreeInfoR17CheckpointManager+0xaaf)[0x45126f] [r40:05056] [ 7] raxml-ng-mpi(_Z11thread_mainR13RaxmlInstanceR17CheckpointManager+0xb73)[0x480f23] [r40:05056] [ 8] raxml-ng-mpi(_Z11master_mainR13RaxmlInstanceR17CheckpointManager+0x408)[0x482078] [r40:05056] [ 9] raxml-ng-mpi(_Z13internal_mainiPPcPv+0x162c)[0x483c8c] [r40:05056] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2abf6fb59555] [r40:05056] [11] raxml-ng-mpi[0x42f16e] [r40:05056] End of error message mpirun noticed that process rank 0 with PID 5056 on node r40 exited on signal 6 (Aborted).

I have attached a file with the sequences which didn't work.

fasta_files.zip

amkozlov commented 4 years ago

Hi @maernster ,

I can reproduce the problem with v0.9.0, but not with the latest github version (see attached log file).

So I assume the problem has been fixed already.

out_name.raxml.log