Open nicolo-tellini opened 3 years ago
Hello,
I also add an other observation. The number of var in site_results.txt differs from the number of SNPs in the vcf although I did not specify any cutoff for the allele freq in the input command. For example, the VCF position below is missing in site_results.txt
chrI_Scc 140142 . A G 222 . AC1=2;AC=18;AF1=1;AN=18;BQB=0.830275;DP4=2,2,793,936;DP=1844;FQ=-281.989;MQ0F=0.0114504;MQ=55;MQB=0.779817;MQSB=0.705766;PV4=0.503694,1,0.269537,1; RPB=0.706422;SF=0,1,2,3,4,5,6,7,8;SGB=-0.693147;VDB=0.475097 AD:DP 0,18:18
Best,
Nicolò
Hello @nicolo-tellini ! Thanks for the great questions.
NONREF_NONPOLY means the major allele at the site does not match the reference sequence, and the site is not polymorphic. Thus, the site is fixed for a non-reference nucleotide.
Regarding the puzzling results, the most likely reason is a mismatch when the SNPGenie VCF format being specified. However, it is very difficult for me to troubleshoot without exact example files. If you could prepare a set of small example files (FASTA / VCF / GTF) and email it to me, I'd be happy to give it a try and see what's happening.
Let me know! Chase
Hello
I run your tool and I find it really useful nevertheless there are some outcomes that I do not completely understand.
One is about NONREF_NONPOLY under the column class in the file site_results.txt. what does NONREF_NONPOLY means ?
The second one is the output of population_summary. Even if I have SNPs this file is always empty whatever chromosome I run :
The command I run is :
snpgenie.pl --vcfformat=4 --snpreport=./vcf/chrIV_Scc.AfricanBeer.vcf --fastafile=./fasta/chrIV.Scc.fa --gtffile=./gft/chrIV_Scc.Scc.gft --outdir chrIV
so each file is specific for chr 4 (chrIV_Scc) and I modified the vcf as suggested in #35. here an example :
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AfricanBeer
chrIV_Scc 26094 . C A 159.76 . INFO AD:DP 4,4:39 chrIV_Scc 27939 . T C 222 . INFO AD:DP 0,4:39 chrIV_Scc 28094 . C T 222 . INFO AD:DP 0,4:39 chrIV_Scc 28654 . C T 225.01 . INFO AD:DP 0,1:39 chrIV_Scc 28968 . C T 225.01 . INFO AD:DP 0,1:39
few notes: I replaced the field INFO with INFO to make it readable. In contrast to #35 the DP field disagree with the sum of AD values because some positions were missing in the original vcf files.
I am sorry for the format of the text but I am new in git-hub.
Thank you in advance
Best,
Nicolò Tellini