Big-Bee-Network / Bee-Specialization-Modeling

Leveraging Large Biological Interaction Data to Quantify Plant Specialization by Bees
0 stars 1 forks source link

new phylogenetic distance matrix using Henriquez_Piskulich #5

Closed seltmann closed 3 weeks ago

seltmann commented 1 month ago

@cmsmith91 we have a new phylogenetic distance matrix for the bees using the Henriquez_Piskulich tree, also using an updated version of globi_allNamesUpdated.csv that includes the generic changes from Henriquez_Piskulich.

It looks strange to me with many of the values being larger and exactly the same as each other. If you have any suggestions, let me know!

output: https://github.com/Big-Bee-Network/Bee-Specialization-Modeling/blob/master/modeling_data/bee_phylogenetic_data_Henriquez_Piskulich_tree.csv

script: https://github.com/Big-Bee-Network/Bee-Specialization-Modeling/blob/master/scripts/make%20bee%20phylogeny_Henriquez%20Piskulich.R

seltmann commented 3 weeks ago

I solved the issue by 1) rooting the tree on all of the wasp outgroups and 2) changing the lambda variable which smooths the tree during conversion to ultrametric.

Explanation from ChatGPT: Role of lambda in chronos Smoothing Parameter:

lambda is a smoothing parameter that penalizes rate variation across the tree. A higher lambda value enforces more clock-like behavior (i.e., constant rates of evolution across the tree), while a lower lambda value allows for more rate variation. Balancing Fit and Smoothing:

The parameter balances the fit of the tree to the data and the smoothness of rate changes along the branches. This is akin to regularization in statistical models, where a penalty term prevents overfitting by controlling the complexity of the model. Penalized Likelihood Approach:

chronos uses a penalized likelihood approach to estimate divergence times. The likelihood of the data given the tree is combined with a penalty term that discourages rate variation. The strength of this penalty is controlled by lambda. Impact on the Tree:

High lambda Value: The tree will approximate a strict molecular clock, with less variation in branch lengths. This can lead to more uniform branch lengths and times. Low lambda Value: The tree will allow for more rate variation, resulting in more heterogeneous branch lengths that better reflect different evolutionary rates among lineages.