Hypermut removed very large portion of sequences from dataset

PoonLab / vindels

Developing an empirical model of sequence insertion and deletion in virus genomes

1 stars 0 forks source link

Hypermut removed very large portion of sequences from dataset #83

Closed jpalmer37 closed 5 years ago

jpalmer37 commented 5 years ago

The original data set contained 260 sequences: 30651-rtt and hypermut.py filtered out 74 of these to leave 186 remaining:

The removal of these sequences clearly weakens the signal and date range of this data set. Do you think I should trust this result or investigate this behaviour more?

ArtPoon commented 5 years ago

No something is wrong. I think using the global consensus sequence for hypermut is a mistake, we should be using the consensus of the sequences from the first sample collection date.

jpalmer37 commented 5 years ago

I see. That makes sense. I'll apply that fix. Thanks!

jpalmer37 commented 5 years ago

Applied this fix using this new script: hypermut screen. Still need to verify that output is reasonable.