abacus-gene / paml

PAML is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. Please report only **technical issues** on this repository (e.g., compiling, programs abort or do not run at all, etc.). Problems with input data and general questions should be posted at https://groups.google.com/g/pamlsoftware?pli
GNU General Public License v3.0
103 stars 19 forks source link

baseml does not parse dates when using a clock model #43

Closed jtmccr1 closed 1 month ago

jtmccr1 commented 5 months ago

I apologize if this not the sort of issue to post here, but I think it is related to a bug in the code and not my use of the program. (Please close this without comment if you disagree!)

I was not able to get baseml to parse tip dates when using a clock model. The output always returned tip dates of 0 for all tips.

I think this is is because GetTipDate in treesub.c iterates over the species in stree and as far as I can tell this is 0 in baseml (and maybe codeml as well). In my own experiments I was able to get baseml to parse the dates if I instead iterate over com.ns and use nodes instead of stree.nodes. I don't think this is a very robust solution and probably breaks other parts of the code, but I hope it helps point to a straightforward solution.

ziheng-yang commented 5 months ago

i will need some example files to look into this. can you crete a set of files to duplicate the problem? try to make the files small.

also in the examples/TipDate.HIV2/ folder, there are some examples, including the control file baseml.ctl and the data file HIV2ge.txt.
perhaps you can take a look. best, ziheng

jtmccr1 commented 5 months ago

Of course! Sorry that's always helpful.

Running

git checkout tags/4.10.7 
make
cd examples/TipDate.HIV2/
 ../../src/baseml ./baseml.ctl

Seems to return an ultrametric tree in mlb as well as and the following output about nodes and times

Substitution rate is per time unit

    0.021368 +-      nan
Nodes and Times

Node   1 Time    0.00 Node   2 Time    0.00 Node   3 Time    0.00 Node   4 Time    0.00 Node   5 Time    0.00 Node   6 Time    0.00 Node   7 Time    0.00 Node   8 Time    0.00 Node   9 Time    0.00 Node  10 Time    0.00 Node  11 Time    0.00 Node  12 Time    0.00 Node  13 Time    0.00 Node  14 Time    0.00 Node  15 Time    0.00 Node  16 Time    0.00 Node  17 Time    0.00 Node  18 Time    0.00 Node  19 Time    0.00 Node  20 Time    0.00 Node  21 Time    0.00 Node  22 Time    0.00 Node  23 Time    0.00 Node  24 Time    0.00 Node  25 Time    0.00 Node  26 Time    0.00 Node  27 Time    0.00 Node  28 Time    0.00 Node  29 Time    0.00 Node  30 Time    0.00 Node  31 Time    0.00 Node  32 Time    0.00 Node  33 Time    0.00 Node  34 Time -1036.60  +-    nanNode  35 Time -845.92  +-    nanNode  36 Time -497.13  +-    nanNode  37 Time  -78.72  +-    nanNode  38 Time -811.08  +-    nanNode  39 Time -595.92  +-    nanNode  40 Time -430.17  +-    nanNode  41 Time -788.72  +-    nanNode  42 Time -742.20  +-    nanNode  43 Time -623.88  +-    nanNode  44 Time -319.42  +-    nanNode  45 Time -290.42  +-    nanNode  46 Time -263.41  +-    nanNode  47 Time -212.20  +-    nanNode  48 Time -349.28  +-    nanNode  49 Time -179.04  +-    nanNode  50 Time -114.56  +-    nanNode  51 Time -303.94  +-    nanNode  52 Time -288.83  +-    nanNode  53 Time -281.76  +-    nanNode  54 Time -245.70  +-    nanNode  55 Time -209.54  +-    nanNode  56 Time -279.67  +-    nanNode  57 Time -269.89  +-    nanNode  58 Time -233.34  +-    nanNode  59 Time -208.68  +-    nanNode  60 Time -179.50  +-    nanNode  61 Time -144.27  +-    nanNode  62 Time -158.46  +-    nanNode  63 Time -180.40  +-    nanNode  64 Time -928.66  +-    nanNode  65 Time -470.37  +-    nan

I think this means the node ages are not being assigned from the taxa labels.

I've attached the whole output file in case it is helpful. mlb.txt

ziheng-yang commented 5 months ago

yes, there is a bug and the tip dates were not parsed by baseml.
i posted an update on the dev/ branch. could you check. thanks for reporting the bug. have a nice weekend. ziheng

jtmccr1 commented 5 months ago

Thank you! It works!

I had issue compiling, though.

We needed to make a small modification to paml.h (suggested by @huddlej). It looks like two structs need to be defined before line 372.

struct SPECIESTREE;
struct TREEN;
void copy_from_Sptree(struct SPECIESTREE* stree, struct TREEN* nodes);
void copy_to_Sptree(struct SPECIESTREE* stree, struct TREEN* nodes);

I also noticed the node times were printed to stdout and not the results file, which is fine - but I don't know if that was intended. I think that comes from line 3952 in treesub.c. These seem like very minor edits, but I can open a pull request if it's easier. Thanks again, have a good weekend.

ziheng-yang commented 5 months ago

yes, that should cause compiling errors. somehow vc didn't complain and i didn't check with gcc.
this is now fixed, i think. thanks and have a nice weekend. ziheng

jtmccr1 commented 5 months ago

Works great! Thanks again!

ziheng-yang commented 5 months ago

sorry it just occurred to me that i directed some print-out onto the monitor during debugging and forgot to return it to the main output file. i pushed another update to fix this. sorry this has taken several rounds. it has been a busy time and i had trouble focusing. best wishes, ziheng

jtmccr1 commented 5 months ago

no problem at all! Thank you for taking the time. I can see the output in the main output file now. I think there is a new line between nodes whereas before there was a tab (I think). It makes the file longer, but easier to read.

I think this issue can be closed now or it will be closed automatically when it's merged in to the main branch. Thanks again for taking the time to look into this!