blackrim / treePL

Phylogenetic penalized likelihood
https://github.com/blackrim/treePL/wiki
GNU General Public License v3.0
45 stars 19 forks source link

treePL not following age calibrations #58

Closed jbernst closed 2 years ago

jbernst commented 3 years ago

Hello,

I am trying to date a tree the is construction from about 4 million sites (concatenated genomic data). In my tree file, I am trying to calibrate a particular node of 30 million years (fossil evidence) and so I made my config file as follows:

treefile = IQTREE_fresh-75p_reduce_concatenated.phylip.contree
smooth = 100
numsites = 4110774
mrca  = COLELAP Chironius_exoletus_SRS2115290 Micrurus_brasiliensis_SRS2115293
min = COLELAP 30.8
max = COLELAP 31
outfile = intree.dated.tre
thorough
prime        

However, it keeps making the age of this node 40 or 50 million years old and all of the nodes in my tree are significantly greater then expected (by about 10-20 million years). I have tried making the min and max the same value, and I also tried using another constraint and it does the same thing. I've tried cross-validation (which increased the ages by about 70 million years, which is impossible here) and using the prime parameter to optimize everything. How can I fix this? I am happy to provide any files, thank you!

Best, Justin

josephwb commented 3 years ago

@jbernst I'd be happy to look at the files. I would need the treefile and the config file. You can send them to phylo.jwb AT gmail.com, or just post them here.

jbernst commented 2 years ago

Hi @josephwb, sorry for the delayed reply. So I got it to work (and to be honest, I am not sure how...), but I have a question on how to make sure my analysis is accurate. I ran the config file (above) as is, and then I copy and pasted the suggested parameter settings from the output. I reran the analysis, this time with prime commented out.

My question is pertaining to that COLELAP calibration I am using in my config file. This is actually the minimum age of a fossil that is found in my outgroup. My dates are younger than some dates I used from Sanger sequencing (but still comparable, and this dataset I am currently working with is a genomics targeted capture set). If I let treePL run with only the min = COLELAP 30.8 line, the dates go to the quadruple digits (and these are snakes, so they are not more than tens of millions of years old for my group.

How should I accurately date my tree for a publication when I only have a minimum age for this fossil? The fossil's minimum age is 30.9 mya +/- 0.1 mya. I also have a clade in my tree (population level) on an island, so I can also use the age of the island of 3-5 mya. Thank you!

josephwb commented 2 years ago

These methods work best if you can place a maximum (or fixed) age on some node, ideally the root. This can be tricky without an actual fossil. As I see it, you have three options: 1) implement a maximum plausible age for the root (although it is likely, since you report old ages already, that the maximum age you implement will be the one returned, and so comes off as pseudo-data), 2) implement the fossil you have as a fixed age, or 3) add an outgroup taxon that will allow using a fossil on the root (although this obviously requires sequence data for the outgroup, which can be prohibitive). In the second scenario (arguably the best option), the fossil undoubtedly underestimates the actual age, but you can frame the results like "this is the inferred timescale implied by the fossil evidence".

Good luck.

jbernst commented 2 years ago

Thanks so much! I will follow option 2, as that is actually my only option at the current moment. I will however play around with some secondary calibrations and see how dates compare (hopefully congruently). I just want to make sure I am being transparent with my results and evolutionary inference. Much appreciated!