evogytis / fluB

Investigating the (co)evolution of reassorting influenza B lineages.
4 stars 0 forks source link

Argument for selection or co-adaptation of PB1-PB2-HA is not very strong. #12

Closed evogytis closed 10 years ago

evogytis commented 10 years ago

I don't believe the authors have evidence that PB1-PB2, HA are co-adapted, only evidence that this part of the influenza genome behaves clonally, in the sense that linkage among these segments has not been broken in the past. Is that because reassortment is relatively rare? Or, because this particular constellation has much higher fitness than other constellations? The authors are saying that the latter is more likely than the former. I think it is a reasonable guess, but I don't think the evidence supporting this claim is strong. The top of page 15 also contains some speculation in my opinion.

evogytis commented 10 years ago

The paper lists several possible scenarios of selection that may explain the observed reassortment patterns. It argues for co-adaptation specific to HA, PB1 and PB2: the adaptative mutations in one of these segments depend on the presence of specific alleles in another segment, which introduces a fitness cost for reassorted strains. I agree that this mode of evolution is plausible, but little evidence for it is actually presented in the paper. In particular, I think the analysis does not show that the selection is segment-specific. An alternative scenario would be that HA, PB1 and PB2 have stochastically evolved to isolation by distance, but similar hybrid incompatibilities would arise for any segment that has evolved to a similar inter-lineage divergence.

trvrb commented 10 years ago

Looking at figure 2, am I interpreting things correctly to say that we identify 2 reassortment events in PA that break Vic and Yam, 3 reassortment events in NP, 1 reassortment event in NA, 3 reassortment events in MP, etc...? If this is indeed the case, 1-3 is not that different from 0.

Looking at just Vic/Yam reassortment events I don't think the stochastic hypothesis can be rejected. If this were the case, then high LD between PB1, PB2 and HA would follow from the stochastically deep bifurcation. You have more data now, would it be possible to get better counts of reassortment events? With more counts you might be able to test statistical significance.

Maybe some back-of-the-envelop model comparison, where you have one model with a single Poisson rate of reassortment across all 8 segments and another model with 2 Poisson rates (1 for PB1-PB2-HA and 1 for the others). This requires putative per-segment reassortment counts. With this, you could do a likelihood ratio test or compute AIC to do the simple model comparison. Marginal likelihood will be sensitive to the choice in prior for the Poisson rates.

If this isn't possible, then better admitting the stochastic possibility would be helpful.

evogytis commented 10 years ago

I was searching for more reassortants in the 1600 dataset and identified a few other reassortants which are mostly singletons. PA has one extra reassortant (B/Victoria/500/2002), NP has extra 3 singletons (B/Brisbane/163/2008, B/New York/1283/2011 & B/Bangladesh/5945/2009), NA might have a few extra ones, since B/Waikato/6/2005-like viruses (the PB1+2 / HA reassortants) are a bit dispersed in the NA tree (but could also be phylogenetic noise) and there's also a few interesting NS reassortants. B/Waikato/6/2005-like viruses also underwent a few interesting reassortments (one NS and one PB2 event). The NS one was B/Newcastle/12/2005, which acquired a Yam lineage NS (Yam NS presumed extinct since 2000) and PB2 reassortant is B/Waikato/70/2005, a Yam lineage virus which replaced its Yam PB2 with a different Yam PB2. You can check them out in the ML trees (the tips are annotated with genome composition). The most interesting reassortant is B/Malaysia/1829782/2007, which looks like a bona fide PB1+HA / PB2 reassortant.

Overall it does look like inter-lineage reassortments are rare (and when observed in a huge dataset don't seem to be that persistent), but I think it's still quite striking that all persistent inter-lineage reassortants have preserved 'pure' PB1-PB2-HAs. We could go back to Paul's combinatorial approach - calculate the probability of not picking 3 segments across 5 reassortment events of a given size. Because it's the same segments (PB1, PB2 and HA) that don't get picked every time the probabilities for each event stack, no? We should thus have:

((5 choose 1) / (8 choose 1))^2 * ((5 choose 3) / (8 choose 3))^2 * ((5 choose 4) / (8 choose 4))

Squared bits are for events that have occurred twice. I just realized that we also have to ignore the initial (8 choose 3) I suggested ages ago, because we're not interested in all possible ways of picking 3 segments out of 8, just PB1-PB2-HA. This gives us a probability of 0.00089 of not picking PB1-PB2-HA during a reassortment that will give rise to a successful lineage. Unless there's something wrong with that calculation, it seems like a very unlikely thing to happen if segments are picked stochastically for successful reassortments.

trvrb commented 10 years ago

I had forgotten about this calculation (sorry about that). Perfect. I'll look over the math, but I think this is a great way to go to address this issue.

trvrb commented 10 years ago

Another way of looking at this math:

There are 256 ways to pick lineages in a reassortment event:

VVVVVVVV
VVVVVVVY
VVVVVVYY
etc...

If we consider the first three (i.e. PB1-PB2-HA) as special and of interest when they travel together. We care about instances where the first three are VVV or YYY. This is 64 out of 256 possibilities. So a 25% chance for each major reassortment event to keep PB1-PB2-HA together. We need to do this 5 times. So 0.25^5 = 0.000977. So, almost exactly what you get, but we definitely did something differently.

trvrb commented 10 years ago

Small revision here.... the picking the VVVVVVVV or YYYYYYYY combinations won't manifest as a reassortment event. So, I think there are then 254 ways to pick lineages lineages, of which 62 keep PB1-PB2-HA associated. This is a 24.41% chance of keeping these associated in each event. For 5 reassortment events, p = 0.00087.

trvrb commented 10 years ago

It looks like this holds up to multiple testing as well. There are 56 ways to pick 3 segments out of 8. If we imagine that we'd be writing the paper about MP-NA-PB1 (or whatever) if they had co-assorted then we get p = 0.00087 × 56 = 0.04853. I wasn't sure this was the right way to think about things, so I simulated 100k draws and tested this directly. Here I got p = 0.049.