ddarriba / jmodeltest2

Automatically exported from code.google.com/p/jmodeltest2
GNU General Public License v3.0
74 stars 47 forks source link

JC and F81 likelihoods are calculated incorrectly #25

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Start JModelTest and load the primate-mtDNA.nex data file from the 
"example-data" folder.
2. Compute likelihood scores using BioNJ-JC tree and "3" substitution schemes.
3. Do BIC calculations.

What is the expected output? What do you see instead?

The -lnL of the JC model computed by PAUP* is 6424.202.  The value reported by 
JModelTest is "6442.6510".  Similarly, PAUP* obtains 6284.996 for the F81 
model, whereas JModelTest reports "6305.0338".  JC+I, JC+G, etc. have similar 
discrepancies.  However, for models other than JC and F81 the two programs 
report identical likelihood scores.

What version of the product are you using? On what operating system?

JModelTest 2.1.4, Mac OSX 10.8.5

Please provide any additional information below.

I noticed the problem because there were slight discrepancies in the BIC (or 
AIC) weights reported by JModelTest and a new feature being added to PAUP*.  
Although the JC and F81 models typically fit poorly compared to other models, 
for some data sets the error in calculating BIC (or AIC) scores was large 
enough to produce a slight, but noticeable, difference in the resulting weights.

It is interesting that when PhyML is run standalone, it obtains the same lnL 
values as PAUP* for the JC and F81 models (see attached file).  So the problem 
is likely either in the command-line arguments passed to PhyML by JModelTest, 
or in the parsing of the output.

Original issue reported on code.google.com by dave.swo...@gmail.com on 8 Mar 2014 at 3:46

Attachments:

GoogleCodeExporter commented 9 years ago
After closer inspection, it seems like models containing gamma rates are OK 
(JC+G, JC+I+G, F81+G, F81+I+G).  However, JC+I and F81+I  are incorrect.

Original comment by dave.swo...@gmail.com on 9 Mar 2014 at 11:36

GoogleCodeExporter commented 9 years ago
Hi Dave,

I was not able to reproduce the error. I tested the primates file with both 
linux x84-64 and OS X 10.8.5 and I'm getting the same results in both systems:

* BIC MODEL SELECTION : Selection uncertainty

Model             -lnL    K         BIC      delta      weight cumWeight
------------------------------------------------------------------------ 
GTR          5934.1469   29  12065.4988     0.0000      1.0000    1.0000 
HKY          5981.7202   25  12133.4447    67.9459   1.76e-015    1.0000 
SYM          5989.3816   26  12155.5676    90.0688   2.77e-020    1.0000 
K80          6142.4291   22  12434.4619   368.9631   7.60e-081    1.0000 
F81          6284.9956   24  12733.1953   667.6965   1.03e-145    1.0000 
JC           6424.2025   21  12991.2085   925.7097   9.65e-202    1.0000
------------------------------------------------------------------------

What command line are you using? Could you please attach your complete phyml 
log? It should be stored in the "log" directory.

Original comment by DiegoDL84 on 14 Mar 2014 at 4:33

GoogleCodeExporter commented 9 years ago
Hmm� here's what I get for the same analysis (i.e., turning off G and I).  I 
was running from the GUI.

* BIC MODEL SELECTION : Selection uncertainty

Model             -lnL    K         BIC      delta      weight cumWeight
------------------------------------------------------------------------ 
GTR          5934.1469   29  12065.4988     0.0000      1.0000    1.0000 
HKY          5981.7203   25  12133.4448    67.9460   1.76e-015    1.0000 
SYM          5989.3816   26  12155.5676    90.0688   2.77e-020    1.0000 
K80          6142.4291   22  12434.4619   368.9632   7.60e-081    1.0000 
F81          6305.0338   24  12773.2718   707.7730   2.04e-154    1.0000 
JC           6442.6510   21  13028.1055   962.6067   9.39e-210    1.0000
------------------------------------------------------------------------

As you can see, the results match (within roundoff error) for everything except 
F81 and JC.

The contents of the associated log file don't seem very useful (just the phyml 
command lines), but I have attached it.

The file "log-window.txt" has more PhyMl output that I copied and pasted from 
the GUI log pane (unfortunately it looks pretty scrambled for reasons I don't 
understand but you might).

Dave

Original comment by dave.swo...@gmail.com on 14 Mar 2014 at 4:54

GoogleCodeExporter commented 9 years ago
I attached the files in my email reply but they didn't end up in the issue 
tracker.  I am adding them here.

Original comment by dave.swo...@gmail.com on 14 Mar 2014 at 5:03

Attachments:

GoogleCodeExporter commented 9 years ago
I was wrong... I checked the OS X version with ML tree instead of BIONJ. It 
works well in Linux, but OS X PhyML binary is a 3 year-old version (version 
20110526) and there might be some bug there.

I will update that. However, if you have a more recent version of PhyML you can 
set jModelTest for using it in conf/jmodeltest.conf file.

Best,
Diego D.

Original comment by DiegoDL84 on 14 Mar 2014 at 5:32

GoogleCodeExporter commented 9 years ago
OK--I was mainly just reporting the bug for the benefit of you and others.  I 
have what I need in PAUP* now.

But for the sake of completeness, I did use the JModelTest-embedded version of 
PhyML to run the standalone analysis I reported in the first message, and it 
correctly calculated the likelihood of the JC model.  So there was something 
more going on than a PhyML bug alone.

Dave

Original comment by dave.swo...@gmail.com on 14 Mar 2014 at 5:43

GoogleCodeExporter commented 9 years ago
How do I implement your suggestion to use my system-wide version of phyml (3.1)?

It is installed as /usr/local/bin/phyml.  In conf/jmodeltest.conf I set 
global-phyml-exe = true and exe-dir = /usr/local/bin.  When I attempt to 
compute likelihood scores JModelTest, I get: "Cannot run the Phyml command line 
for some reason: Cannot run program "phyml": error=2, No such file or 
directory".

I tried several other permutations, including temporarily renaming my 
executable to "PhyML_3.0_macOS_i386" but I haven't been able to get anything to 
work.

Original comment by dave.swo...@gmail.com on 15 Mar 2014 at 5:01

GoogleCodeExporter commented 9 years ago
In frustration, I just copied my phyml binary to the exe/phyml directory in the 
JModelTest directory and renamed it to "PhyML_3.0_macOS_i386" (even though it's 
really 3.1).  I get exactly the same (erroneous) results with this more recent 
phyml binary.

Original comment by dave.swo...@gmail.com on 15 Mar 2014 at 5:26

GoogleCodeExporter commented 9 years ago
OK--I see why your results differed from mine, but this shows that there is 
indeed a bug that needs to be fixed.  As I indicated in my first post, I used 
"BioNJ-JC" for the base tree.  You apparently used "BioNJ" instead.  I don't 
think you used the "ML tree" as you indicated in post #5 because the parameter 
counts don't match in this case (JModelTest then adds 1 parameter for the 
estimation of the topology, which is not reflected in the table you posted).

If I use "BioNJ" rather than "Fixed BioNJ-JC", I get the results you posted.  
But the BioNJ tree using JC distances should have the same likelihood under the 
JC model regardless of which of those two options was selected for the base 
tree, and the discrepancy in the results demonstrates that something has indeed 
gone wrong.

In the future, if someone goes to the trouble of supplying the steps to 
reproduce the problem in a bug report, it would be nice if you would follow 
those steps before responding that you couldn't reproduce it.  I've wasted a 
lot of time tracking this down, but if you had just followed my instructions 
you would have immediately seen the problem.

Original comment by dave.swo...@gmail.com on 15 Mar 2014 at 6:06

GoogleCodeExporter commented 9 years ago
I thank a lot all your help, but jModelTest is a tool we are offering for free, 
even the source code, and it is just one of the many tasks I have to do, and 
that's the reason why I usually read the mails a bit fast and I can miss some 
details. However, I rectified not so long after my first response, confirming 
that indeed there was a bug in PhyML.

The ML results I was referring to in my second comment are not the ones in the 
first table. Those correspond to a BIONJ run in Linux.

Regarding using BIONJ or BIONJ-JC, as you said it should make no difference for 
the JC model. However, I wrote an answer to this thread that for some reason it 
is not visible here, that should go between #8 and #9. I found there is a 
problem in that old version that brings different results when using "-m 000000 
-f e" and "-m JC69", while it should not.

Anyway, I will compile and update the PhyML binaries within the next release.

Original comment by DiegoDL84 on 28 Mar 2014 at 9:58

GoogleCodeExporter commented 9 years ago
Hi again Dave,

I told Diego to be more careful when reading issue reports..   all these 
differences in likelihoods usually have to do with different versions of phyml 
being used, although in this case it seems that there might be a bug in phyml, 
our parsing seems ok so far. but we will look into this

thanks for all yuor help.

take care,

D

D

Original comment by dposad...@gmail.com on 3 Apr 2014 at 1:45

GoogleCodeExporter commented 9 years ago
Yes, it appears that there was indeed a bug in PhyML that has now been fixed.  
I thought I had gotten the version of PhyML packaged with JModelTest to output 
the correct likelihoods for the JC/F81 models, but I may have been 
unintentionally running the newer, fixed, version.

The reason I got cranky in my previous post was that Diego initially claimed 
that he was unable to reproduce the problem, but it was only because he failed 
to follow the instructions I provided.  As a developer myself, I am always 
grateful for bug reports that tell me exactly what to do to demonstrate a 
problem, so I try to do the same for others.  But it's a waste of time to do 
this if my instructions are not followed on the other end.

Dave

Original comment by dave.swo...@gmail.com on 3 Apr 2014 at 2:27

GoogleCodeExporter commented 9 years ago
You are totally right Dave, sorry about this.  We take note. Best, D

Original comment by dposad...@gmail.com on 3 Apr 2014 at 9:11

GoogleCodeExporter commented 9 years ago

Original comment by DiegoDL84 on 2 Sep 2014 at 8:30