Closed trvrb closed 10 years ago
Yeah, I've thought about this before (hence the option to timeslice the alignment in the LD calculator). Three points on this:
Having said that point 2 sounds very appealing to me. We could try to address the problems of binning in point 3 by raising the minor allele frequency cutoff and maybe doing sliding window LD rather than time binned LD. I imagine it would look like a plasma globe - a handful of sites on PB1, PB2 and HA will maintain strong links in the data whilst others belonging to transient lineages will appear and fade as you go forwards in time. I'm fine with either binning or keeping the data as it is, if you're more comfortable with binned data I can re-run the LD analyses.
I'm not surprised you've already given this some thought. I'm happy to proceed however you'd prefer. Two best scenarios that I see:
I'm not sure if recapitulating Figure 4 or expressing per site LD is better.
I've run the 1600 dataset through the LD calculator by taking sequences from a sliding window (window size = 3 or 4 years, moved by a year each time). Each comparison in the heatmap contains the mean of LD across all time slices, but it looks like a mess. There's no noticeable LD between any sites or it's in the wrong place. I'll check the code more closely tomorrow to see if I'm not messing something up. Also managed to get the LD plasma ball animation/plot to work. It looks good but I can also see a few interesting things I can't see in the heatmaps - for example you can notice reassortment events. In the time frame when the B/Waikato/6/2005-like PB1+2 / HA reassortants exist LD between PB1+2 and HA decays and then recovers after the reassortants go extinct. What I think it also shows is that what we get out of timeslicing the LD results will be very sensitive to sampling regime.
So I think it's best to leave the LD results as they are. I'd be willing to argue this with the reviewers (if they decide to pick it up) and I got a cool-looking animation for future presentations out of it.
Sounds good. I like your plan. Thanks for going to the trouble to look into this. Closing.
Reading your paragraph on D' made me question the LD analysis a bit:
Your analysis of ΔTMRCA nicely controls for differences in sampling date between tips, however, the LD analysis just lumps everything together. Let's say we start with ab and then Ab fixes and then AB fixes, so that most early samples are ab and most late samples are AB. This will appear as strong LD between loci A and B, even though it's just temporal sampling. Would it make sense to bin years before calculating LD and then average across years?