MrOlm / inStrain

Bioinformatics program inStrain
MIT License
145 stars 33 forks source link

Insights regarding nucleotide diversity vs SNVs #92

Open DDavila10 opened 3 years ago

DDavila10 commented 3 years ago

Hi!

I am using inStrain in gut mouse microbiome samples. I compared nucleotide diversity and number of SNVs of every MAG and I observed that in most of the cases a decrease of π, more SNVs detected. I was assuming that lower nucleotide diversity, lowers SNVs detected. On the other hand, I also observed an increase of π but a decrease of SNVs.

Do you have any ideas of why is this happening?

I would really appreciate you help on this.

Thank you so much for the help!

Best, Daniel

MrOlm commented 3 years ago

Hi Daniel,

Interesting.

Nucleotide diversity is raw measure of the "noise" in the mapping; it is impacted by real population genomic variation, the error rate of the sequencing technology being used, and shouldn't be impacted much by coverage.

SNVs is essentially a measure of the number of times a mutation rises above the "noise" to become detected at significant levels in the population. It is not impacted by the error rate of the sequencing technology being used, but is impacted by coverage (higher coverage means more ability to detect SNVs).

My assumption would be that the two measures be "un-linked" (no relationship between the two).

My guess that is that the association is being driven by something like coverage? Or a phylogenetic bias? Maybe @alexcritschristoph has a guess as well?

Sorry I don't have a simpler explanation, but we're still figuring out what these things mean ourselves as well! I would also note this paper which looked at the relationship a bit as well- https://onlinelibrary.wiley.com/doi/full/10.1111/mec.16208

Best, Matt

DDavila10 commented 3 years ago

Hi Matt,

Thank you so much for the prompt response and the clear explanation. I will have a look on the publication that you suggest. Maybe @alexcritschristoph could have a guess on this topic and help us to understand this.

Thank you so much again for your valuable help. Best,

alexcritschristoph commented 2 years ago

Hi Daniel, This would be pretty unusual. I have seen R2 of >0.95 in correlation between pi and number of SNVs. Can you share some graphs of your data?

DDavila10 commented 2 years ago

Hi Alex,

Thank you so much for your response and help. Here I am attaching the π and number of SNV per genome that I am studying. The x axis represent the different time points. You can see that in some genome the decrease of π also shows decrease of SNV but in other is the opposite. What do you think is happening here?

Thank you so much again for your help and time!

image