Closed ConnorChato closed 4 years ago
Nodes tend to represent points in the tree where each child subtree is close together in time (or mean time). This leads to extremely limited variation in time differences displayed by the tree - For Seattle, 97% of all nodes have child branches with mean collection dates within a year of eachother (mean Difference of 34 days). Given the distribution of collection dates in the Seattle data set, I'd still expect a skew towards 0, but a much more reasonable mean difference of about 3.7 years. I'm exploring strategies that incorporate the expected distribution of time differences (ie. If they didn't matter) compared to those shown in the resulting most-likely tree.
Hmmm - Okay, it looks like this may not necessarily just be the model, new sequences don't seem that likely to join recent subtrees compared to older subtrees. If anything, new cases have a slight preference for older subtrees.
I'm going to check for the correctness of time information, check the original tree and pplacer runs, and then re-run this stuff with the Tennessee Diagnostic dates. Once I process them, I'll do the Seattle Diagnostic dates too.
Similar things going on with the diagnostic Tennessee Data set. Huge preference for nodes with children that are close together in time, but nothing too dramatic shows up after the tips are added with pplacer.
I'm going to look over how I'm calculating growth again and then maybe look into alternatives to pplacer. Pumper is less used, but I believe it's still seeing regular updates
Fixed - there was an issue with how the dates ended up synching up in the tree-building process.
Initial results seemed to only update the null AIC, while the model AIC remained static. This explains the dramatic AIC loss and "To good to be true" smoothness of the plot shown by initial test-runs. . Interestingly, an AIC of 300 is still pretty low compared to the AICs of 500-600 that previous null models were obtaining.