chasewnelson / SNPGenie

Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
GNU General Public License v3.0
102 stars 37 forks source link

column header "ID" #22

Closed genferreri closed 4 years ago

genferreri commented 5 years ago

Hello Chase, I am trying to run SNPGenie and I got the following error: temp_vcf4_POS.vcf does not contain the standard VCF column header "ID". SNPGenie terminated. I checked the .vcf file (produced by LoFreq) and the "ID" header is there and it is also present in the temp_vcf4_ID.vcf file. So up to that point the program is able to parse the "ID" column. I am not sure what can be wrong. I hope you can help me.

Thanks

Lucas

singing-scientist commented 5 years ago

Hello Lucas @genferreri, thanks very much for using SNPGenie. Unfortunately, this is not a problem I can diagnose without seeing your input file. Please provide a link to an example of your problematic input and I'll check it out. Some likely sources of error are that the file is not tab-delimited, or is using Windows CRLF (\r\n) or Mac line endings instead of Unix (\n) (see Troubleshooting).

genferreri commented 5 years ago

Hello Chase, thanks for the prompt answer. I went through troubleshooting and I made sure that is tab-delimited and has Unix line endings (In sublime). You can find the vcf file along with reference and gtf file here. I uploaded the last two just in case the problem does not come from the vcf (?). Sorry for the troubles. Thanks Lucas

singing-scientist commented 5 years ago

Thanks. Please also provide the command you used, including all arguments, so I can duplicate the conditions under which the error occurred. Then I'll be able to take a look.

genferreri commented 5 years ago

This is the line: perl snpgenie.pl --minfreq=0.02 --vcfformat=4 --snpreport=AO-256_cutadapt_LoFreq.clean.vcf --fastafile=WF10_continuous-RF.fa --gtffile=WF10_continuous.gtf.txt

singing-scientist commented 4 years ago

Dear @genferreri : apologies, as I have just moved to another country and quite lost track of my 'to-do' list.

Your VCF file is not vcfformat 4, it is vcfformat 2 (see SNPGenie's documentation). It works for me (and the output makes sense, at first glance) when specifying the correct input:

snpgenie.pl --minfreq=0.02 --vcfformat=2 --snpreport=AO-256_cutadapt_LoFreq.vcf --fastafile=WF10_continuous-ORF.fa --gtffile=WF10_continuous.gtf.txt

Here are the results I obtain. Let me know.

Yours, Chase

genferreri commented 4 years ago

Hello @cwnelson88 , no worries, I moved myself from country several times, I know the stress. I am glad that the problem was rather simple to solve. Although I am a little confused since the header of the vcf file specifies "##fileformat=VCFv4.0". I will run this in my computer and I will let you know wether it worked or not. Thank you very much again.

Lucas