chasewnelson / SNPGenie

Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
GNU General Public License v3.0
109 stars 37 forks source link

Warning for coverage and nucleotide sums #73

Closed shamavirani closed 10 months ago

shamavirani commented 11 months ago

Hi,

Thanks for creating and maintaining this tool so well, it is much appreciated!

I'm running the within pool analysis using a multisample vcf and I'm getting the warning:

WARNING: In temp_vcf4_UNCseq1796_0000135C5L.vcf, at site 3548,

the coverage (345.000) does not equal the nucleotide sum (1029.000).

and thus

WARNING: In temp_vcf4_UNCseq1796_0000135C5L.vcf|gene-E2|3548,

the nucleotide total (which should be 100.00%) is instead: 198.84%.

This should occur only when conflicting coverages have been reported.

However, I'm looking at the temp vcf and I don't see where that nucleotide sum is coming from. I've done a lot of troubleshooting and am now coming to you because I'm not sure what to do here. For info, this is happening for multiple samples at the same positions, however, this is not happening at all positions. I've emailed the vcf.

Thank you, Shama

singing-scientist commented 10 months ago

Greetings @shamavirani and apologies for the holiday delay! I have received the vcf but in order for me to best pin down what's happening, I'd be grateful for a MRE (minimal reproducible example): a set of 1) vcf 2) fasta 3) gtf that can be quickly run to reproduce the issue. Without having seen your files, it's possible e.g., that your GTF contains multiple distinct CDS records (e.g. alternate transcripts) by the same name, but impossible to know. Let me know!

shamavirani commented 10 months ago

@singing-scientist sorry for that oversight, sent!

shamavirani commented 10 months ago

@singing-scientist I've resolved the issue, there were non-SNPs included in my vcf. Many thanks for the responsiveness and for this great tool.