chasewnelson / SNPGenie

Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
GNU General Public License v3.0
109 stars 37 forks source link

CDS annotation(s) does not have a gene_id #67

Closed frith6 closed 1 year ago

frith6 commented 1 year ago

Hello, I am receiving the warning in the title when running SNPGenie as follows: SNPGenie/snpgenie.pl --vcfformat=2 --snpreport=CS1-005-190617_S2_L001_HPV16REF_filtered.vcf --fastafile=HPV16REF.fasta --gtffile=HPV16REFcool.gtf I produced the gtf file from a Genbank record which I converted to GTF via GFF3 using bioperl and gffread and then manually edited to remove non-CDS records and transcript_id. Thank you for any help you can offer! These are the relevant files: CS1-005-190617_S2_L001_HPV16REF_filtered.vcf.gz HPV16REFcool.gtf.txt fasta reference: https://pave.niaid.nih.gov/locus_viewer?seq_id=HPV16REF

singing-scientist commented 1 year ago

Greetings @frith6 and apologies for the uninformative error! This is due to the present of invalid (for SNPGenie) characters and ^ in your gene names. It should work if you replace these with word characters, e.g., E6 ==> E6p; E8^E2 ==> E8_E2; etc. I have also updated the program on GitHub so that it now gives a more informative error at this juncture. Let me know if it works for you!

Chase

frith6 commented 1 year ago

Thanks, that solved everything!