evogytis / fluB

Investigating the (co)evolution of reassorting influenza B lineages.
4 stars 0 forks source link

SPR distances #8

Closed trvrb closed 10 years ago

trvrb commented 10 years ago

I'd suggest dropping SPR distances. Results are very inclusive with high credible intervals. Is the finding of similar SPR distances across segment pairs indicative of anything besides a lack of power? If so, maybe keep, otherwise it just seems to add complication and confusion.

trvrb commented 10 years ago

Figures 7 and 8 show that despite all pairs of segments having the same reassortment frequency, suggesting there is no bias in segment packaging, PB1, PB2 and HA segments exhibit much lower reassortment ‘distances’.

Okay. This is a good direction. However, it seems like an underpowered statement given the CIs in Figure 7.

I think it might be better to just reference Figure 6 --- that mixed PB1, PB2 and HA genomes appear but don't persist.

trvrb commented 10 years ago

there have been at least 4 sampled mixed-lineage PB1-PB2-HA complex constellations which did not become fixed in the population and our estimates of the number of reassortments do not differ significantly between all segments.

Also a good direction. Generally, it seems that SPR distance is effected by three things:

  1. Reassortment at older nodes
  2. Reassortment at recent nodes
  3. Phylogenetic uncertainty (much stronger at recent nodes)

I fear that (3) is swamping out signal from (1) and (2). I know you are accounting for (3) with the normalization procedure from Methods, but large phylogenetic uncertainty can still wash out signal even if you've accounted for bias.

Looking at ΔTMRCA emphasizes (1) and shows differences between segments. I fear that trying to emphasize (2) will still have the same problem of (3) washing out signal.

evogytis commented 10 years ago

I agree that (3) will most likely be responsible for the vast majority of SPR distances since otherwise PB1, PB2 and HA wouldn't lend themselves to exact SPR distance calculations so easily. The reason why I included them is that I saw a relatively parsimonious explanation for what we're seeing (packaging bias, rather than selection). I do mention that the SPR distance analyses might lack power in the manuscript, although I could put more stress on it.

trvrb commented 10 years ago

It's really good to try to address the packaging bias hypothesis. Following up here however:

In Figure S10 (InfB_supp_RErate_trees) we see an average "reassortment rate (approximate number of SPR moves per total time in both trees)" of approximately 0.32 across all pairs of segments. In Figure S11 (InfB_supp_RErate_replicates) we see an average reassortment rate across replicates from the same segment of approximately 0.42.

This seems backwards. Rate should be higher between segments rather than across replicates of the same segment. I must be missing something.

evogytis commented 10 years ago

Good point. I've just checked the code, which was a bit rushed in the first place, but can't see anything blatantly wrong with it. I'll take a more thorough look tomorrow. I suspect it might be the fact that SPR distance between trees A and B is being divided by the total time in both tree A and B, whereas replicate comparisons are only being divided by the total time in one tree. I'll check whether I get the same thing with our exact SPR distances. But I can't dismiss bad coding either.

trvrb commented 10 years ago

I might actually think that Figures S9-S11 are too much of a digression. You focus on SPR distance in the text and report normalized SPR distance in Figure 7. Along these lines, figures S5 and S6 do make sense (we observe greater SPR distance between segments than across replicates within the same segment). This is very comforting and suggests that it is definitely not all hypothesis (3) you're picking up on.

evogytis commented 10 years ago

Just checked the code. It's all good. Delving deeper revealed that approximate SPR distances are just disproportionately large compared to total time in the tree for replicate comparisons e.g. one comparison is: [A:B] = 0.36 (311 aSPR / 848 years in tree) [A:A] = 0.45 (198 aSPR/ 434 years in tree) [B:B] = 0.62 (260 aSPR/ 414 years in tree)

I agree that it would be too much of a digression to include this: it's technically correct but will cause nothing but confusion for the reviewers.

I'd like to keep the normalized and unnormalized reassortment rate figures though. I can see how someone might want to see SPR distances corrected for time on tree. Also, as mentioned before I'll make sure to emphasise that our SPR distance analysis most likely lacks power.

trvrb commented 10 years ago

I'm in agreement. Thanks for looking into this.