chasewnelson / SNPGenie

Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
GNU General Public License v3.0
100 stars 37 forks source link

Need help to determine method for inference of convergent evolution #66

Open qianxuans opened 1 year ago

qianxuans commented 1 year ago

Hi, I am doing an analysis to infer the convergent evolution of bacteria in a longitudinal study. Several clones of the same bacterium are studied to determine whether they have within-host convergent evolution. For each clone, samples were collected from different time points. It is kind of similar idea to this research #62 . If I want to analyze whether there is convergent evolution that occurs among several clones, what is the best method I should use?

  1. Should I call SNP and use the original SNPGenie or should I use the within-group with the msa? If I use msa instead of vcf, would it be overkill like the situation described here? #44
  2. Will VCFGenie be helpful in this case?

Thank you so much for your help!

singing-scientist commented 1 year ago

Greetings, @qianxuans !

To me, the question 'is there convergent evolution' could simply mean, 'does the same mutation arise independently in different lineages'? Alternatively, it could mean 'does the same mutation arise independently and also increase in frequency to >50% in different lineages'? In the first case it might be enough to determine whether the variant is present in multiple clones. In the second, it might be even simpler, i.e., whether the same variant is present in the consensus sequence of multiple clones at the end of the study. If you do find such a variant, you'd probably want to deep sequence the original/source sample to see whether the variant was already present at low levels, or whether it arose de novo in multiple lineages.

I'm not sure what to advise, because the best approach will depend on the specific question you have. VCFgenie is up and running, and would be useful for quality filtering VCF files to help determine which variants are real (not sequencing error). SNPGenie can use those VCF files to estimate natural selection, if that's part of your goal. If you chose the MSA version, you'd probably be comparing consensus sequences from different time points, which is a different approach than within-timepoint variant.

Let me know if that helps! Chase