blackrim / treePL

Phylogenetic penalized likelihood
https://github.com/blackrim/treePL/wiki
GNU General Public License v3.0
45 stars 19 forks source link

R script to check consistency of calibrations for 100 large trees (~3,000 taxa)? #59

Open Belinda333 opened 2 years ago

Belinda333 commented 2 years ago

Hi everyone,

I am new to Github! I have a question related to a post that was already closed (https://github.com/blackrim/treePL/issues/31) but since I am not sure if it can be reopened again(??) I decided to create a new post here (sorry for the double posting)... When I want to run treePL for 100 RAxML trees (~3,000 taxa) I get the following error message: "Failed setting feasible start rates/dates after 10 attempts. Aborting." Hinting to a problem with my calibrations, correct? Or could it be also the trees that are not always strictly bifurcating? If the former, does anyone have a R code/script that he would be willing to share with me so that I can check that constraints are consistent for all my trees? Since I have many trees with many taxa a R code that I could loop through for all trees would really help me a lot!

Any feedback is highly welcome!

Best wishes, Belinda

Belinda333 commented 2 years ago

P.S: These are my calibrations:

mrca = Lepidosauria Alligator_mississippiensis Anolis_trachyderma min = Lepidosauria 255 max = Lepidosauria 300

mrca = Rhynchocephalian Sphenodon_punctatus Anolis_trachyderma min = Rhynchocephalian 223

mrca = Gekkotan Phyllurus_kabikabi Pachydactylus_serval min = Gekkotan 54

mrca = Konkasaurus Cricosaura_typica Cordylus_cordylus min = Konkasaurus 65

mrca = Chamops Callopistes_maculatus Gymnophthalmus_cryptus min = Chamops 70

mrca = Hodzhakulia Rhineura_floridana Darevskia_bendimahiensis min = Hodzhakulia 111

mrca = Palaeosaniwa Shinisaurus_crocodilurus Varanus_acanthurus min = Palaeosaniwa 70

mrca = Odaxosaurus Xenosaurus_platyceps Abronia_aurita min = Odaxosaurus 70

mrca = Priscagamines Palleon_nasus Anolis_trachyderma min = Priscagamines 70

mrca = Primaderma Heloderma_suspectum Abronia_aurita min = Primaderma 100

mrca = Contogenys Typhlosaurus_meyeri Caledoniscincus_terma min = Contogenys 70

josephwb commented 2 years ago

Hi @Belinda333. I will clean up my R code so it is more generally usable, and post the code here. It may take me a day or two to get to it, though.

josephwb commented 2 years ago

@Belinda333 And if you posted a sample of trees that would help me troubleshoot. You can attach a file in this thread (i.e., no need to paste the contents of the file itself).

Belinda333 commented 2 years ago

Hi Joseph,

that would be fantastic! Thanks! Please find attached the input files (only 15 trees). Please let me know if you need any other data/information. Thanks a lot again!

Best wishes, Belind configuration.txt RAXML_100trees.txt a

josephwb commented 2 years ago

Hi @Belinda333.

These trees 1) are all unrooted and 2) do not have edge lengths, so they cannot be put through treePL.

My constraint checker code also requires that the trees be rooted correctly (or else the meaning of ancestor/descendant node changes).

Belinda333 commented 2 years ago

Dear Joseph,

ah sorry, I attached the input without branch lengths. Okay, I did not know that they have to be rooted... I will root them and send them to you. Thank you!

Best, Belinda

Belinda333 commented 2 years ago

Dear Joseph,

please find attached the rooted trees. Thanks a lot again for helping me!

Best wishes, Belinda rooted_trees_treePL.txt

Belinda333 commented 2 years ago

Dear Joseph,

I think I found the issue- I think the dates for Primaderma & Odaxosaurus were not compatible. At least treePL is running now after I removed Odaxosaurus...Anyways, I would still be very much interested in your R code- whenever you have the time. Thanks a lot again!

Best, Belinda

josephwb commented 2 years ago

I'll get it posted by the end of the weekend. Sorry for the delay.

Belinda333 commented 2 years ago

Great! Thanks a lot! No need to stress :)

Belinda333 commented 2 years ago

Dear Joseph,

I have one more questions- is treePL running correctly when I see the same message reappearing: calculating without gradient (might want to try a different optad=VALUE) setting NLOPT : LN_SBPLX result: 5 after opt calc2: 338969.16 ?? Does that mean the program is stuck? What would you advise me to do? Thanks a lot again!

Best wishes, Belinda

josephwb commented 2 years ago

@Belinda333 I am working on this, and will post the code when it seems in shareable form.

As for results so far, for the 100 trees you sent I do not encounter topological rearrangements where 2 fossils date the same node (invalidating the younger constraint). However, across most there are two instances where an ancestral node has a smaller minimum age than does its descendant node.

> check_constraints_consistent(phy, cnstrnts);
[1] "Problem with constraint 'Lepidosauria' (minage = 255)."
[1] "  Ancestor 'Rhynchocephalian' has a younger minage (223)."
[1] "Problem with constraint 'Primaderma' (minage = 100)."
[1] "  Ancestor 'Odaxosaurus' has a younger minage (70)."
        constraint node min max                     rtax                       ltax  keep       notes
1      Lepidosauria 2839 255 300       Anolis_trachyderma Alligator_mississippiensis  TRUE        good
2  Rhynchocephalian 2838 223  NA       Anolis_trachyderma        Sphenodon_punctatus FALSE invalidated
3          Gekkotan 5009  54  NA     Pachydactylus_serval         Phyllurus_kabikabi  TRUE        good
4       Konkasaurus 3526  65  NA        Cordylus_cordylus          Cricosaura_typica  TRUE        good
5           Chamops 4684  70  NA   Gymnophthalmus_cryptus      Callopistes_maculatus  TRUE        good
6       Hodzhakulia 4824 111  NA Darevskia_bendimahiensis         Rhineura_floridana  TRUE        good
7      Palaeosaniwa 3669  70  NA       Varanus_acanthurus   Shinisaurus_crocodilurus  TRUE        good
8       Odaxosaurus 3624  70  NA           Abronia_aurita       Xenosaurus_platyceps FALSE invalidated
9     Priscagamines 3723  70  NA       Anolis_trachyderma              Palleon_nasus  TRUE        good
10       Primaderma 3626 100  NA           Abronia_aurita        Heloderma_suspectum  TRUE        good
11       Contogenys 2844  70  NA    Caledoniscincus_terma        Typhlosaurus_meyeri  TRUE        good

In practice this is not a problem since they are not strictly conflicting (like if an ancestral maximum age was younger than a descendant minimum age), but the ancestral constraints give no information to the analysis, and so are redundant (the constraint on the descendant node does all the constraining required). Such redundant constraints should be removed.

However, in looking at one of your trees (the first one in the distribution) I think there is a rooting problem. I believe you want Alligator_mississippiensis to be the outgroup, but instead Sphenodon_punctatus is (you can see this in the table above: Rhynchocephalian is the root node = # tips + 1). In addition, the branch leading to Alligator_mississippiensis is of length zero, so you effectively have a polytomy at the root. This certainly seems to explain the issue with the first two constraints above. And the zero-length edge might partly explain the CV issue you reported elsewhere.

More later.

Belinda333 commented 2 years ago

Dear Joseph,

thanks a lot! Ok, I see. Yes, Alligator & Sphenodon are the outgroups and Alligator should be the root. However, I already tried to run the analyses without Alligator but it still won't run properly. So I guess there might be some other issues still?!

Best wishes, Belinda

Belinda333 commented 2 years ago

In detail, here is what I get when I try to run the priming analyses without Alligator: using system clock for random number seed = 1638178604 tiny branch length at Niveoscincus_microlepidotus. setting to 2.293315e-05 tiny branch length at Cyclodomorphus_michaeli. setting to 2.293315e-05 tiny branch length at Voeltzkowia_rubrocaudata. setting to 2.293315e-05 tiny branch length at Brachymeles_pathfinderi. setting to 2.293315e-05 tiny branch length at Bachia_heteropa. setting to 2.293315e-05 tiny branch length at Darevskia_sapphirina. setting to 2.293315e-05 tiny branch length at Varanus_macraei. setting to 2.293315e-05 tiny branch length at Abronia_aurita. setting to 2.293315e-05 tiny branch length at Liolaemus_walkeri. setting to 2.293315e-05 tiny branch length at Gonocephalus_kuhlii. setting to 2.293315e-05 tiny branch length at Gonocephalus_chamaeleontinus. setting to 2.293315e-05 tiny branch length at Sphaerodactylus_nicholsi. setting to 2.293315e-05 tiny branch length at Lygodactylus_picturatus. setting to 2.293315e-05 tiny branch length at Lygodactylus_luteopicturatus. setting to 2.293315e-05 tiny branch length at Lygodactylus_verticillatus. setting to 2.293315e-05 tiny branch length at Chondrodactylus_turneri. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 tiny branch length at internal node. setting to 2.293315e-05 setting Chamops min: 70 setting Contogenys min: 70 setting Gekkotan min: 54 setting Hodzhakulia min: 111 setting Konkasaurus min: 65 setting Palaeosaniwa min: 70 setting Primaderma min: 100 setting Priscagamines min: 70 setting Rhynchocephalian min: 223 setting Rhynchocephalian max: 300 preorder prep calculating character durations setting min and max setting up all constraints getting feasible start dates start rate 0.0026118639 numparams:2836 initial calc: 59183695 LF SIM exit siman: 3921244.8 nfeval: 43 rc: Linear search failed after opt calc1: 3114591.5 nfeval: 66 rc: Linear search failed after opt calc2: 3114591.5 LF SIM exit siman: 2213877 nfeval: 22 rc: Linear search failed after opt calc1: 2213877 nfeval: 126 rc: Linear search failed after opt calc2: 2195579.1 LF SIM exit siman: 1689531.4 nfeval: 14 rc: Linear search failed after opt calc1: 1689531.4 nfeval: 66 rc: Linear search failed after opt calc2: 1689531.4 exited lf converged :1 lf calc: 1689531.4 numparams:8505 smoothing:10 1689531.4 exit siman: 1545881.9 nfeval: 10000 rc: Maximum number of function evaluations reached after opt calc1: 879804.42 nfeval: 119 rc: Linear search failed after opt calc2: 879803.08 exit siman: 794377.68 nfeval: 6613 rc: Linear search failed after opt calc1: 788052.54 nfeval: 124 rc: Linear search failed after opt calc2: 788052.54 exit siman: 729957.95 nfeval: 52 rc: Linear search failed after opt calc1: 729957.95 nfeval: 197 rc: Linear search failed after opt calc2: 729904.73 exit siman: 687422.62 nfeval: 52 rc: Linear search failed after opt calc1: 687422.62 nfeval: 232 rc: Linear search failed after opt calc2: 687399.39 exit siman: 653428.53 nfeval: 10000 rc: Maximum number of function evaluations reached after opt calc1: 651797.93 nfeval: 131 rc: Linear search failed after opt calc2: 651797.69 after opt calc: 651797.69

josephwb commented 2 years ago

Hi @Belinda333.

Can you confirm that your tree is rooted? I rerooted the first tree in the distribution on Alligator_mississippiensis. This made the root constraint problem from before to go away:

> check_constraints_consistent(phy, cnstrnts);
[1] "Problem with constraint 'Primaderma' (minage = 100)."
[1] "  Ancestor 'Odaxosaurus' has a younger minage (70)."
         constraint node min max                     rtax                       ltax  keep       notes
1      Lepidosauria 2838 255 300       Anolis_trachyderma Alligator_mississippiensis  TRUE        good
2  Rhynchocephalian 2839 223  NA       Anolis_trachyderma        Sphenodon_punctatus  TRUE        good
3          Gekkotan 5009  54  NA     Pachydactylus_serval         Phyllurus_kabikabi  TRUE        good
4       Konkasaurus 3526  65  NA        Cordylus_cordylus          Cricosaura_typica  TRUE        good
5           Chamops 4684  70  NA   Gymnophthalmus_cryptus      Callopistes_maculatus  TRUE        good
6       Hodzhakulia 4824 111  NA Darevskia_bendimahiensis         Rhineura_floridana  TRUE        good
7      Palaeosaniwa 3669  70  NA       Varanus_acanthurus   Shinisaurus_crocodilurus  TRUE        good
8       Odaxosaurus 3624  70  NA           Abronia_aurita       Xenosaurus_platyceps FALSE invalidated
9     Priscagamines 3723  70  NA       Anolis_trachyderma              Palleon_nasus  TRUE        good
10       Primaderma 3626 100  NA           Abronia_aurita        Heloderma_suspectum  TRUE        good
11       Contogenys 2844  70  NA    Caledoniscincus_terma        Typhlosaurus_meyeri  TRUE        good

I then primed it with treePL (I didn't bother to take out the Odaxosaurus constraint), and things seemed to work:

PLACE THE LINES BELOW IN THE CONFIGURATION FILE
opt = 1
optad = 1
optcvad = 1

Hopefully rooting is your only problem; the constraints do not seem to be. Regardless, I will finish cleaning up my R code today and send it along. Sorry for the delay.

Belinda333 commented 2 years ago

Great! Thank you! I used a loop to root my trees but I will check it again and reroot the trees with Alligator. Thanks so much again, you saved me a lot of headache :) Looking forward to your R code!

Belinda333 commented 2 years ago

Dear Joseph,

sorry to bother you again but I tried running the CV analyses with the values you got during priming (all set to 1) using either a rerooted tree (my best tree of the 100 I want to date) with Alligator as outgroup, 2) one where Alligator was removed and 3) one where Sphenodon was removed (i.e. only one outgroup included in option 2 or 3). However, I still cannot get an output and the error message "calculating without gradient (might want to try a different optad=VALUE)". What do you think might be the issue? I am attaching my best tree (which I checked in R and is binary & rooted) that includes on Sphenodon as outgroup and my script. Thanks a lot again!

Best wishes, Belinda configuration-no-alli.txt RAXML_best_tree_no_alligator.txt

josephwb commented 2 years ago

Hrm. I don't know. It may be that it failed at 0.001. However, you have your cvstart and cvstop mixed up, so it probably got confused and crashed/exited when it tried to do the next value. I am trying it with the values flipped, but my computer is very slow.

Belinda333 commented 2 years ago

Hi Joseph,

ok, thanks. I will try to run it with the values switched. Thanks.

Best, Belinda

Belinda333 commented 2 years ago

Nope, even when switching the values around I get the same issue:

exit siman: 227884.15 setting NLOPT: LD_LBFGS result: -1 after opt calc1: 227884.16 setting NLOPT: LD_LBFGS result: -1 calculating without gradient (might want to try a different optad=VALUE) setting NLOPT : LN_SBPLX result: 5 after opt calc2: 227884.16 after opt calc: 227884.16

josephwb commented 2 years ago

I do not know what the issue is. However, it at least starts if you do cv rather than randomcv.

Belinda333 commented 2 years ago

ok. thanks