chasewnelson / SNPGenie

Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
GNU General Public License v3.0
102 stars 37 forks source link

Troubleshooting and STOP codon #20

Closed mullerbsf closed 5 years ago

mullerbsf commented 5 years ago

Dear Chase,

I am trying to run the SNPGenie and I got several warnings related to "No SNPs in >=1000 contiguous codons" (SNPGenie_LOG.txt ):

temp_vcf4_D710_D501.vcf Cre01.g000100 35174 No SNPs in >=1000 contiguous codons. If this was unexpected, you may need to specify Unix (\n) newline characters! See Troubleshooting temp_vcf4_D710_D501.vcf Cre01.g001150 146734 No SNPs in >=1000 contiguous codons. If this was unexpected, you may need to specify Unix (\n) newline characters! See Troubleshooting temp_vcf4_D710_D501.vcf Cre01.g001200 159483 No SNPs in >=1000 contiguous codons. If this was unexpected, you may need to specify Unix (\n) newline characters! See Troubleshooting

Also, I got some warnings related to "Mid-sequence STOP codon" (SNPGenie_LOG.txt ):

N/A Cre01.g001678 284241 Mid-sequence STOP codon. Please check your annotations for: (1) incorrect frame; or (2) incorrect starting or ending coordinates. A premature STOP codon may also indicate a pseudogene, for which piN vs. piS analysis may not be appropriate.

Can I ignore these warnings? Or the software will not generate correct results?

Thank you! Barbara

singing-scientist commented 5 years ago

Thanks very much for your questions and for using SNPGenie! These errors are simply meant to make the user aware of potential problems. SNPGenie will perform all calculations as normal. However, depending on the source of the error, the results may or may not be reliable (see below).

First, the error "No SNPs in >=1000 contiguous codons. If this was unexpected, you may need to specify Unix (\n) newline characters! See Troubleshooting". This error indicates the presence of a long stretch of protein-coding sites at which no variation was observed. Depending on your data, this may or may not indicate a problem. For example, if the diversity expected in your sample leads you to expect a SNP every ~90 sites (~30 codons) or so, it seems plausible some SNPs are missing from the input or were not read correctly. For example, perhaps a certain region of the genome did not have sufficient coverage to call SNPs. In that case, you will want to exclude genes in those regions from analysis.

Second, the error "Mid-sequence STOP codon. Please check your annotations". This simply means there's a mid-frame STOP codon in this gene. This could be due to an incorrect sequence or gene coordinates. However, if it's legitimate/correct, then you may want to find out of the STOP codon is ignored during translation as a read-through codon. The issue with STOP codons is, if they're legitimate, it means the remainder of the gene was not translated, and therefore was not subject to natural selection (i.e., it may be a pseudogene). You will need to be aware of this fact when interpreting the meaning of πN/πS. For example, if your goal is to estimate mean πN/πS for protein-coding genes, then you may wish to exclude this one because it is unlikely to be under purifying selection, and thus its πN/πS will be elevated.

Let me know if that helps... Chase

singing-scientist commented 5 years ago

I will close this issue now. Please re-open if there are any follow-up questions.

NiklausMikaelson12138 commented 1 year ago

Hi, I have a problem with it. In my vcf file ,there are snps in those proteins. But, the message remains no snps, what is point? Thank you! No SNPs in >=1000 contiguous codons. If this was unexpected, you may need to specify Unix (\n) newline characters! See Troubleshooting