MrOlm / inStrain

Bioinformatics program inStrain
MIT License
134 stars 33 forks source link

How should one interpret genes displaying a high pN value alongside a pS value of 0 #175

Open haihao999 opened 3 months ago

haihao999 commented 3 months ago

Hi,Matt I have two primary inquiries. The first concerns a gene with a pS value of zero and the highest pN value among the set. Should its pN/pS ratio also prove to be the highest, it would align with the observed relationship concerning environmental concentrations. However, the inability to calculate the gene's pN/pS value presents a challenge. I aim to utilize nucleotide diversity as evidence of environmental selection. Yet, the six-fold variance in genomic diversity among samples could obscure inference results. Thus, I question the validity of applying single-copy genes' nucleotide diversity (scg's nuclD) de-standardization in this context? The second inquiry involves a pathway containing this gene among three. The absence of a pN/pS value for this gene results in a lower average pN/pS for the pathway in this environment. Consequently, I seek to determine the feasibility of aggregating these three genes as a singular entity for analysis using raw data? best, yanpeng

MrOlm commented 3 months ago

Hi yanpeng,

That's an interesting finding, and I share your concerns about normalization in this context. To answer both of your questions, if I were you I would not apply normalization based on nucleotide diversity. To correct for the "0" issue, I would analyze the pN/pS of the whole pathway together, as you suggest. You can use the raw values to sum the numbers in that way- it is indeed a valid thing to do.

Best, Matt